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ABSTRACT 



\ 



\ A preliminary survey of 32 school districts 
identified as having highly developed teacher evaluation systems was 
followed by the\selection of 4 case study districts (Salt Lake City, 
Utah; Lake Washington, Washington; Greenwich, Connecticut; and 
Toledo, Ohio) representing diverse teacher evaluation processes and 
organizational environments. Common factors found to underlie the 
success of the evaluation systems in each case study district were: 
demonstration of Organizational commitment to teacher evaluation, 
procedures for ensuring evaluator competence, collaboration with the 
teachers' organization and individual teachers, and compatibility of 
teacher evaluation with other district management strategies. The 
conclusions reached, to be modified on the basis of local 
experiences, are that a successful teacher evaluation system should: 
(1) suit the educational goals, management style, conception of 
teaching, and community values of the school district; (2) have 
top-level commitment to, and resources for, evaluation; (3) match the 
of the district; (4) have efficient use of resources to 



purpose 

achieve reliability, validity, and cost-effectiveness; 
teacher involvement and responsibility. (MLF) 



and (5) have 
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PREFACE 



This report describes a 1983 Rand study of teacher evaluation prac- 
tices. The Btudy was financed by the National Institute of Education, 
which correctly predicted the growing interest in improving teacher 
evaluation. The report should be of interest ^o those initiating or 
revising teacher evaluation procedures. 

School systems evaluate teachers to facilitate decisions about 
teacher status and to help teachers improve their performance. Most 
exioting literituro on teacher evaluation concerns evaluation instru- 
ments and ways to improve the technical reliability and validity of 
such instruments (that is, how consistently and how accurately they 
measure teaching performa *oe).^ 

The present study focused on the actual operation of teacher evalua- 
tion procedures in school systems. It examined not only the instru- 
ments and procedures, but also the implementation processes and the 
organizational contexts within which they operate. This approach 
enabled the authors to observe whether and how teacher evaluation 
results are used by the organization. It also indicated the broader orga- 
nizational conditions needed to initiate and sustain effective teacher 
evaluation practices. 

A panel composed of representatives of education and education- 
related organizations advised the study. The panel included: 

Dr. Gordon Cawelti, Executive Director, Association for Super- 
vision and Curriculum Development 

Dr. Susan S. Ellis, Teacher Leader for Staff Development, 
Greenwich (Connecticut) Public Schools (representing the 
National Staff Development Council) 

Ms. Anita Epstein, Governmental Affairs Director, National 
Association of State Boards of Education 

Dr. Jeremiah Floyd. Associate Executive Director, Office of 
Communications and Membership Relations, National School 
Boards /.ssociation 

Dr. David G. Imig, Executive Director, Aroerican Association of 
Colleges for Teacher Education 



Linda Darling-Hammond. Arthur E. Wise, and Sara R Pease, '^Teacher Evalua- 
tion in the Organizational Context: A Review of the Literature," Keview of Educational 
Research, Fall 1983. 
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Df. Jamea Koefe, Director of Rosoaroh, National AsHociatioM of 
Secondary School Principals 

Ms, Lucille Maurer, Member, Maryland House of Delegates 
(representing the National Conference of State Legislatures) 

Dr. Bernard McKenna, Program Developnient Specialist, 
National Education Association 

Ms, Margaret Montgomery, Professional Development Special- 
ist, National Association of Elementary School Principals 

Dr, Reuben Pierce, Acting Assistant Superintendent for Qual- 
ity Assurance, District of Columbia Public Schools 

Dr, Wiliiam Pierce, Executive Director, Council of Chief State 
School Officers 

Ms, Marilyn Rauth, Dir3ctor, Educational Issues Department, 
American Federation of Teachers 

Dr, Robert W. Peebles, Superintendent of Schools, Alexandiia 
(Virginia) City Public Schools (representing the American 
Association of School Administrators). 

The involvement of the panel was meant to ez^courage a study and 
report that would oe relevant to groups with a stako In teacher evalua- 
tion. The panel advised on the research plan, helped to identify r chool 
districts with highly developed teacher evaluation projg^ures, and com- 
mented on drafts of the report. The participation' of these panel 
members, however, does not necessarily imply their endorsement of the 
report's conclusions. 

The panel advised that the report be kept short so that it would be 
widely read. Following this advice, the authors present in this volume 
only their findings, analyses, conclusions, and recommendations. The 
four case studies that provided most of the data for the report are sum- 
marized here; they are also being published separately as Case Studies 
for^ Teacher Evaluation: A Study of Effective Practices, N-2133-NIE, 
June 1984, 
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SUMMARY 



Thr now ^aricern for the quality of education nnd of teachers ie 
bcmg translated into meiit-paj, career-ladder, and master-teacher poli- 
cies that presuppose the exis'.encf* of effective teacher evaluation sys- 
temic. As a result, many si hcA districts will be reassessing their 
teacher evaluation pr uctice^. 

Srhool district administratoiH muBt imdorstand the educational and 
organizational implications of the teacher evaluation system that they 
adopt, becaus? that system can define the nature of tf lohing and edu- 
cation in their schools. In particular, the system ca*' either reinforce 
the idea of teaching as a profession, or it can furlhor lop^jfe^isionalize 
teaching, making it less able to attract and retain talented tiaciior^^. 

FRAMEWORK OF TEACHER EVALUATION 

Teatiicr evaluation may serve four basic purposes: individual staff 
development, school Improvement, individual personnel decisions, and 
school status decisions. The first two purposes involve unprovement; 
the second two, accountability. Altl ough many teacher ev^' aation sys- 
tems may seek to accomplish all four of these purpo^js, different 
processes and methods may better suit individual objectives. In partic- 
ular, improvement and accountability require different standards of 
adequacy and evidence. 

For purposes of accountability, teacher evaluation processes must be 
capable of yielding fairly objective, standardized, and externally defen- 
sible information about teacher performance. For improvement objec- 
tives, evaluation processes, must yield descriptive information that 
illuminates sources of difficulty, as well as viable courses for change. 

To improve a teacher's performance, the school system must "nlist 
the teacher's cooperation, motivate him (or her), and gu'.c?'^ him 
through the steps to improvement. For the individual, impi vement 
relies on the development of two import^int conditions: the knowledge 
that a course of action is the correct vnt and a sense of empowerment 
or efficacy, that is, a perception that pursuing a given course of action 
is both worthwhile and possible. 

The implementation of any school policy, including a teacher evalua- 
tion policy, represents a continuous interplay among diverse policy 
goals, established rules and procedures (concerning bn^h the policy in 
question and other aspects of the sclioors opi.ra^ion), intergroup 
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bargaining and valuo choiccH. and the local inHtitutional context. The 
political climate of the Hchool syntcm. the relations lip of the teachcrH* 
organization to district nianaKcnient. the nature oi other educational 
policieH and operating proj^raiUH in the ^iintrict. and the nir.e and Htruc- 
ture of the nyHtem and iUi lureaucr:icy all inOuenco teacher evaluation 
procedureH. 

SURVEY OF EVALUATION PRACTICES 
IN 32 SCHOOL DISTRICTS 

Wo undertook thin Htudy to find teacher evaluation procosHea that 
produce information that school diHtricts can use for helping teachers 
to improve and/or for making personnel decisions. The study began 
with a review of the literature and a preliminary nurvey of 32 districts 
identified as having highly developed teacher evaluation systems. 

Teacher evaluation practices differed substantially in the 32 school 
districts. Although the practices seemed similar in broad outline, they 
diverged as local implementation choices were made. Our preliminary 
assessment led us to conclude that school authorities do not agree on 
what constitutes the best practice with regard to instrumentation, fre- 
quency of evaluation, the role of the teacher in the process, or how the 
information could or should inform other district activities. These 
differences in practices, we believe, indicate that teacher evaluation 
presently is an underconceptualized and underdeveloped activity. 

Despite differences in level of development and diversity of local 
implementation choices, the major problems associated with teacher 
evaluation practices were similar in the 32 districts surveyed. Almost 
all survey r*»epondents felt that principals lacked sufficient resolve and 
competence to evaluate accurately. Other problems included teacher 
resistance or apathy, the lack of uniformity and consistency of evalua- 
tion within a school system, inadequate training for evaluators, and 
shortcomings in the evaluation of secondary school staff and special- 
ists. ' 

Respondents consistently reported two positive results of teacher 
evaluation: improved teacher-administrator communication and in- 
creased teacher awareness of instructional goals and classroom prac- 
tices. In most of the 32 districts, the teacher evaluation system has led 
to personnel actions. Although few districts used evaluation outcomes 
to terminate tenured staff, nontenured staff were dismissed on the 
basis of evaluation in most sample districts. 
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CASE STUDY FINDINGS: FOUR SUCCESSFUL 
EVALUATION SYSTEMS 

From among tho 32 flurvoy diHtriclH, wo flelectod four caflc Btudy dis- 
trictH roproflonting diverne teacher evaluation procoHflea and organiza- 
tional environmontfl: Salt Lake City. Utah; Lake Washington, Wash- 
ington; Greenwich, Connecticut; and Toledo, Ohio. We spent a week 
in each district interviewing the superintendent and other top adminis- 
trators, officers of the local teachers' organization, school board 
members, parenta, and community representatives. Visiting six schools 
of different typeH in each district, we interviewed principals, specialized 
personnel, and at least six teachers. 

The four case study districts approach the task of teacher evaluation 
in different ways. Th^ir approaches vary with respect to the primary 
evaluators and the t ;achers who are. evaluated. They also differ with 
respect to the major purposes of evaluation, the instruments used, the 
procesMes by which evaluation judgments are made, and the linkage 
between teacher evaluation and other school district activities, such as 
staff development and instructional management. Finally, districts 
represent dramatically different contexts for teacher evaluation in 
t-^rms of student population, financial circumstances, and political 
environment. 

Despite these differences in form, the four districts" follow certain 
common practices in implementing their teacher evaluation systems. 
These commonalities in implementation, in fact, set the systems apart 
from the less successful ones and suggest that implementation factors 
contributing to the success of these systems may also contribute to the 
success of others. 

Specifically, these districts provide top-level leadership and institu- 
tional resources for the evaluation process, ensure that evaluators have 
the necessary expertise to perform their task, encourage teachers and 
administrators to collaborate to develop a common understanding of 
evaluation goals and processes, and use an evaluation process and sup- 
port systems that are compatible with each other and with the 
district's overall goals and organizational context. 

Attention to these four factors— organizational commitment, evalua- 
tor competence, teacher-administrator collnboration, and strategic 
compatibility— has elevated evaluation in these districts from what is 
often a superficial exercise to a meaningful process that produces use- 
ful results. With regard to commitment, all four case study districts 
recognize that the key obstacle to successful evaluation is time — or, 
more precisely, the lack of it— for observing, conferring with, and, 
especially, assisting teachers who most need intensive help. These 
districts create time for evaluation. 
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Kvaluator roinpoli'iico. probably tbo iuohI dimciilt oloinent of tbo 
procoHM, re(|uires (,v,'v> (|ualitioH: tbo al)ility to luako soimd judi^iuciitH 
al)oiit (oacbinK (|nality and ibo ability to make appropriate, coiicrote 
recoiniiuMulatioiiM for iinprovoinoiit of toacbiiiK perfonnaiu'o. Supervi- 
sion of tbe evalur.tion procoHM provides tlu^ most important cbeck on 
^jvaluator competence*. All ff)ur districts bave m^jcbanisms for verifyinK 
tbe accuracy of ovaluators' reports about teacbers. Tbese mecbanisms 
force ovaluators to justify their ratings in precise, concrete terms. 

In the four case study districts, the teachers' organization has collab- 
orated with tbo school administration in tbe design and implementa- 
tion of tbe teaeber process. Tbe extent and nature of tbe collaboration 
between teacbers and administratois in tbe four districts varies, but all 
have means for maintaining communication about evaluation so that 
implementation problenls may be addressed as they occur. 

In each case study district, teacher evaluation supports and is sup- 
ported by other key operating functions in the schools. Evaluation is 
not just an ancillary activity; it is part of a larger strategy for school 
improvement. 

EVALUATING THE TEACHER EVALUATION SYSTEMS 

The four . case study teacher evaluation systems succeed in several 
ways. First, and relatively atypically, tbe school systems implement 
them as planned. Second, all actors in the system understand them. 
Third, the school systems actually use the results. In varying degrees, 
the evaluation processes produce reliable, valid measures of teaching 
performance and are used for teacher improvement and personnel deci- 
sions. 

Reliability in evaluation refers to the consistency of measurements 
across evaluators and observations. The degree of reliability required 
of a teacher evaluation system depends on the use to be made of the 
results. Personnel decisions demand the highest reliability of evalua- 
tion results. Evaluation criteria must be standardized and evaluators 
must apply these criteria with consistency when the results are to be 
used for personnel decisions regarding tenure, dismissal, pay, and pro- 
motion. The evaluation system may tolerate a lower degree of reliabil- 
ity when the results are to be used for formative assessments or infor- 
mational purposes. 

At least three sources of variability may make teacher evaluation 
unreliable: (1) variability in how evaluators interpret what they 
observe or what criteria they stress In making judgments; (2) 
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variability in tho (^valuationH of a Min^lo ovaluator. i,o,. wliotlior tlio 
evaluator uhoh tlie naiuo criteria aiul ai)i)licm tlium ooiuuHtontly wluiu 
()l)H(!rvinK (liHoront toachorH; and CO variability in olmorvationM, i,()., 
whothor tbo ovaluator uhoh tbo Hnmc criteria and api)lios tbeni in tho 
Manio manner wnon obnervinK the Hanio tcacbor on Hcn^rato occ/mionH. 

Tolodo'fl ovaluation procCHS addrcMUioa all ol' tlunno potential Hourccm 
of unreliability by UHing a Hmall ninnbor of evaluatorH, a reportinK pr;)- 
ceBfl that fosterfl common uasoRBmout criteria and applications, and fre- 
quent obflervation and consultation. More important, the consulting 
teachers discuss their observations and evaluations with a review panel 
Hovernl times a year. Finally, the Toledo procen.s increaseH reliability 
by limiting the number of teaohors to bo evaluated and by ullowiLg the 
small group of expert teachers who evaluate them released time. 

The Lake Washington, Greenwich, and Salt Lake City teacher 
evaluation processes require an administrator to evaluate every teacher 
every year. This requirement decreases evaluation reliability by 
increasing the chances of variability among evaluators and variability 
across evaluations and observations. Evaluator training helps to offset 
these sources of unreliability to varying degrees in the three districts. 

The validity of a teacher evaluation process depends on its accuracy 
and comprehensivencHS in assessing .teaching quality as defined by the 
agreed-oii criteria. Although school districts may seek to finesse the 
issue of validity by striving for measurement reliability in thsir evalua- 
tion process, they cannot ignore the validity of the process when they 
use its results as a basis for personnel decisions. 

The criteria, the process for collecting data, and the competence of 
the evaluator contribute to the validity of an evaluation process. The 
purpose of evaluation— the inference to be drawn, the help to be given, 
the decision to be made— determines the validity of the evaluation pro- 
cess. In short, the process must suit the purpose if the results are to be 

judged valid. u j j ,i 

The criteria for judging minimal competence must be standardized, 
generalizable, and uniformly applied. Finer distinctions among good, 
better, and outstanding teachers require nonstandardized, i.e., differen- 
tial, criteria. 

To evaluate minimum competence, the evaluator must be able to 
observe the presence or absence of generic teaching skills. However, to 
evaluate the appropriateness of teaching decisions, the evaluator must 
kj|>ow the subject matter, the pedagogy, and the classroom characteris- 
tics of the teacher being evaluated. The evaluator's level of expertise 
must at least equal, if not exceed, that of the teacher being evaluated. 

In Salt Lake City, Lake Washington, and Toledo, the presence or 
absence of minimal teaching competence, especially the inability to 
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inanaKo iha claMHroom, triKKC^rn iiiunodhito holp. I'rincipalH ^MK^rally 
/ulmit, h()W(^vor, tli.U (hoy Hpond litUo tinui ovaluatiiiK ((laohor^ who 
appoar to ht^ (^oinpotont, and toachcim not Huhjc^ct to Mpocial holp alh^^jo 
(hat thoir (^vahiatiouH have not Kivon thoin coimtructivo critlciMrn 
relevant to thoir area of ioaehiiiK oxportino. 

To incroaHo the validity of 'jvalualorM' judKiuoiitH, all lour (^valuation 
proceHHOH rociuire careful docuiuoniation of teaching beiniviorn reaultinK 
in unHatiHfactory ratingH. TIiIh documentation enahlea Honieone other 
than the ovuluator to verify that the teaching criteria have been 
applied appropriately. In addition, they require multiple observations 
for evnluationH. The Salt Lake, Toledo, and Lake Waahington 
j)roceHHOM provide explicitly for nmltiple olwcrvationH and devote 
rcHourccH in the form of ovaluator time to that end. 

Toledo and Lake WaHhington have taken aggrcHmve titeps to euHure 
validity. Toledo chooses as evaluators consulting teachers who are 
recognized by their peers and administrators as experts in their teach- 
ing areas. The consultants are matched by teaching area to the teach- 
ers they evaluate. Lake Washington trains evaluators in the same 
teaching principles that guide teacher staff development. This training 
enhances the correlation between the evaluators' judgments and the 
standard of practice adopted by the district. 

Salt Lake City enhances validity indirectly by referring decisionmak- 
ing to a committee containing two experts. The validity of evaluation 
judgments rests on the consensus of the committee. The presence of a 
learning specialist and a toacher from the relevant subject area or 
grade level on the committee increases the prospect thdt defensible 
inferences about teacher competence are made. 

The evaluation of relative competence must take into account the 
probable short- and long-run consequences of tet^ching behaviors and 
the substantive basis for teaching judgments. This type of evaluation 
depends on high-inference variables, which require the judgment pf an 
expert observer. 

Greenwich is distinguished by its emphasis on evaluating degrees of 
competence as it seeks to help teachers improve their performance. 
The validity of Greenwich's process rests on its ability to appropriately 
diagnose the individual teacher's needs and to accurately gauge prog- 
ress toward more competent performance in the areas so identified. 
The Greenwich process continues to be relevant as the teacher acquires 
the ability to make professional judgments. »^ 

The utility of teacher evaluation depends in part on its reliability 
and validity, that is, on how consistently and accurately the process 
measures minimal competence and degrees of competence. The utility 
of evaluation depends also on its cost, that is, on whether it achieves 
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usable outcomes without generating excessive costs. The results must 
be worth the time and effort used to obtain them if the- process is io 
survive competing organizational demands. At least three types of 
costs — logistic, financial, and political — should be considered in iassess- 
ing utility. ^ 

Utility represents a proper balance of costs and benefits. The bene- 
fits include the provision of data for decisionmaking, improved com- 
munication, and personnel improvement. 

Toledo's evaluation process has high utility. It succeeds in helping 
teachers to achieve acceptable teaching competence, or in removiiig 
them from the classroom if they do not. It does botli without disrupt- 
ing the system's operations or lowering the morale of school personnel. 

Three critical features ensure the utility of the Toledo process: (1) 
It is carefully managed, and it is conducted by evaluators who have; no 
other, competing responsibilities; (2) it is focused and it uses limited 
resources to reach a carefully defined subset of teachers; and (3) it is a 
collaborative effort and it engages the key political actors in the design, 
implementation, and ongoing redesign^ of the process. Moreover, it 
shows a relatively low overall cost and provides substantial substantive 
and political benefits. 

Salt Lake City's evaluation process, like Toledo's, has fairly high 
utility for accountability purposes. The utility of Lake Washington's 
teacher evaluation process for identifying, assisting, and, if necessary," 
removing incompetent teachers from the classroom is also fairly high. 

The Greenwich system not only enables the school system to engage 
the individual teacher, it does so in a manner that relates directly to 
the teacher's daily profocjsional endeavors. Thus> the utility of the 
Greenwich evaluation process results from its ability to tap teacher 
motivation and desire for self-improvement and to reward teachers' 
efforts by acknowledging ti;eir importance. 

CONCLUSIONS AND RECOMMENDATIONS 

Our conclusions and recommendations constitute a set of necessary, 
but not sufficient, conditions for successful teacher evaluation. In 
practice, educational policies and procedures must be tailored to local 
circumstances. Consequently, these conclusions and recommendations 
may best be thought of as heuristics, or starting strategies to be modi- 
fied on the basis of. local experience. 

Conclusion One: 

To succeed, a teacher evaluation system must suit the educational 
goals, management style, conception of teaching, and cpmmUnity 
values of the school district 



Recommendations: 

1. The school district should examine its educational goals, 
manag<*niant style, conception of teaching, and community 
values and adopt a teacher system compatible with them. It 
should not adopt an evaluation system simply because that 
system works in another district. 

2. States should not impose highly prescriptive cher evalua- 
tion requirements. 

Conclusion Two: 

Top-level commitment to and resources for evaluation outweigh 
checklists and procedures. 

Recommendations: 

3. The school district should give sufficient time, unencumbered 
by competing administrative demands, for evaluation. This 
may mean assigning staff othe> than the school principal to 
some evaluation functions. 

4. The school district should regularly assess the quality of 
evaluation, including individual and collective evaluator com- 
petence. The assessments should provide' feedback to individ- 
ual evaluators and input ir^to the continuing evaluator training 
process. 

5. The school district should train evaluators in observation and 
evaluation techniques, including reporting, diagnosis, and cliiv 
ical supervision skills, when it adopts a new teacher evaluation 
process. 

Conclusion Three: 

The school district should decide the main purpose of its teacher 
evaluation system and then match the process to the purpose. 

Recommendations: 

6. The school district should examine its existing teacher evalua- 
tion system to see which, if any, purpose it serves well. If the 
district changes the purpose, it should change the process. 

7. The school district should decide whether it can afford more 
than one teacher evaluation process or wKether it must choose 
a single process to fit its main purpose. 



14 



xiii 



Conclusion Four: 

To sustain resource commitments and political support, teacher 
evaluation must be seen to have utility. Utility depends on the effi- 
cient use of resources to achieve reliability, validity, and cost- 
effectiveness. 

Recommendations: 

8. The school district must allocate resources commensurate with 
the number of teachers to be evaluated and the importance 
and visibility of evaluation outcomes. 

9. The school district should target resources so as to achieve 
real benefits. 

Conclusion Five: 

Teacher involvement and responsibility improve the quality of 
teacher evaluation. 

Recommendations: ^ ^ 

10. The school district should involve expert teachers in the 
supervision and assistance of their peers, particularly begin- 
ning teachers and those in need of special assistance. 

11. The school district should involve teacher organization in the 
design and oversight of teacher evaluation to ensure its legit- 
imacy, fairness, and effectiveness. 

12. The school district should hold teachers accountable to stan- 
dards of practice that compel them to make appropriate 
instructional decisions on behalf of their students. 
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I. INTRODUCTION 



THE IMPORTANCE OF TEACHER EVALUATION 

A well-designed, properly functioning teacher evaluation process pro- 
vides a major communication link between the school system and 
teachers. On the one hand, it imparts concepts of teaching to teachers 
and frames the conditions of their work. On the other hand, it helps 
the school system to structure, manage, and reward the work of teach- 
ers. 

Teacher evialuation attracted new interest in April 1983, when the 
National Commission on Excellence in Education published A Nation 
at Risk: The Imperative for Educational Reform. Several of the 
commission's recommendations concerned with teaching would require 
teacher evaluation: 

Persons preparing to teach should bfe required to meet high educa- 
tional standards, to demonstrate an aptitude for teaching, and to 
demonstrate competence in an- academic discipline. . . . Salaries for 
the teaching profession iliould be increased and sHould be profes- 
sionally competitive, market-sensitive, and performance-based. 
Salary, promotion, teni;re,.and retention decisions should be tied to 
an effective evaluation tystem that includes peer review so that supe- 
rior teachers can be rewarded, average ones encouraged, and poor 
ones either improved or terminated (p. 30). 

President Reagan's endorsement of merit pay thrust the commission's 
recommendations into the limelight and, with them, the need for a 
careful examination of teacher evaluation practices. 

Action for Excellence, the June 1983 report of the Task Force on 
Education for Economic Growth, Education Commission of the States 
(ECS), echoed some of the Excellence Commission's recommendations: 

We recommend that boards of education and higher education in 
each state— in cooperation with teachers and schooi adminis- 
trators—put in place, as soon as possible, systems for fairly and 
objectively measuring the effectiveness of teachers and rewarding out- 
standing performance. 

We strongly recommend that the states examine and tighten their 
procedures for selecting not only those who come into teaching, but 
also those who ultimately stay. . . . Ineffective teachers— those who 
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fall short repeatedly in fair and objective evaluations— should, in due 
course and with due process, be dismissed (p, 39).^ 

The ECS recommendations reveal a strong preoccupation v/ith 
teacher competence. At the same time, they stress the imponaiice of 
"a new and higher regard for teachers and for the profession of teach- 
ing" (p, 37). 

Education policymakers increasingly consider better teachers and 
better teaching the key to better education. Thc^ Excellence Commis- 
sion, seeking ways to improve the quality of education, Fecommended 
improving the quality of teachers. Exploring ways to restructure edu- 
cation to benefit economic growth, ECS also advocated better teachers. 
In September 1983, the Commission on Precollege Bducatioii in 
Mathematics, Science and Technology of the National Science Board, 
in its report. Educating Americans for the 21st Century, again stressed 
the quality of teachers and teaching. 

As unremarkable as this consensus now seems, it reverses educa- 
tional policy trends of the past two decades. The teacher-proof curric- 
ulum, test-based instructional management, and student competence 
testing initiatives were all based on the premise that education could 
be improved without improving the quality of teachers. 

Teacher evaluation constitutes an important aspect of quality 
improvement. But, improving the quality of teachers and of teaching 
requires more than evaluation; It requires attracting highly able stu- 
dents to teaching, preparing them to teach, ascertaining that they can 
teach, providing an environment in which they can teach, motivating 
them to teach, and persuading them to remain in teaching. At the 
same time, quality improvement requires the introduction of quality- 
control mechanisms that do not distort the educational process in 
unintended and undesirable ways. 

Proper teacher evaluation can determine whether new teachers can 
teach, help all teachers to improve, and indicate when a teacher can or 
will no longer teach effectively. We found, however, that teacher 
evaluation, properly done, is a difficult undertaking. As the results of 
teacher evaluation are put to broader uses, we may expect that the dif- 
ficulties associated with teacher evaluation will increase. 

The new concern for the quality of education and of teachers is 
being translated into merit-pay, career-ladder, and master-teacher poli- 
cies that presuppose the existence of effective teacher evaluation sys- 
tems. Many school districts will be reassessing their teacher evaluation 
practices; certainly, they will be paying more attention to them. 
School district personnel must understand the educational and 

^Emphaais in the original. 
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organizational implications of the teacher evaluation system that vhey 
adopt, because that system can define the nature of teaching and edu- 
cation in their schools. In particular, the system can either reinforce 
the idea of teaching as a profession, or it can further Jeprofesslonalize 
teaching, making it less able to attract and retain talented teachers. 

In sum, before they introduce new policies for certification, tenure, 
promotion, merit pay, master teachers, and differentiated staffing, edu- 
cators and policymakers will want the answeis lo such questions as: 

o Can one teacher evaluation system reward superior teachers, 
encourage average ones, and improve or ter minate the employ- 
ment of poor ones? Can one system ha used for teacher 
improvement as well as personnel decision.^? Under what con- 
ditions? 

• How does a person demonstrate an aptitude for teaching? Can 
this aptitude be recognized in a written test? Or must a pro- 
spective teacher be evaluated while teaching? 

• What problems ixe posed by linking salary, promotion, tenure, 
and retention decisions to teacher evaluation? 

• Can teacher evaluation be used by itself to select master teach- 
ers when master teacher is a rapk like full professor? When 
master teacher is a role like supervising teacher? 

• How can teacher evaluation be used by master teachers to 
supervise probationary teachers? 

THE FOCUS OF THXS STUDY 

We designed this study to assess teacher evaluation practices with a 
view to analyzing how teacher evaluation can be used to improve per- 
sonnel decisions and staff de/plopment. In this report, we describe 
four school districts thf.t use teacher evaluation for these purroses. 
Wr discovered in the cou/se of this study, however, that relatively few 
school districts have highly developed teacher evaluation systems, fjid 
even fewer put the results into action. This discovery suggests that 
most school systems will have to develop teacher evaluation systems 
before they can introduce innovative personnel practices. 

The report explains how the roaster teacher concept operates in four 
school districts. It discusses this particular concept becai:* -j the dis- 
tricts that we visited happened to use expert teachers (variously 
defined and titled) to help with teacher evaluation and staff develop- 
ment. Our report does not directly address the use of teacher evalua- 
tion results for merit pay and the selection of master teachers, because 
none of the districts considered for this study used teacher evaluation , 
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for merit pay and none used teacher cv j^^ ation results by themselves 
to select master teachers. Nevertheless, our findings provide important 
insights into the problems and prosptjcts as.tiociated with these propos- 
als. 



THE METHODOLOGY 

The study began with a review of the literature (summarized in Sec- 
tion II) and a preliminary survey of 32 districts. used a two-stage 
reputational sampling process to find school districts with highly 
developed teacher evaluation practices, obtaining nominations from the 
literature on teacher evaluation, members of our advisory panel, 
researchers, and practitioners. We conducted exploratory interviews in 
32 sites, speaking at length with the individual having primary respon- 
sibility for teacher evaluation, and collected relevant record data, such 
as district evaluation goal statements, evaluation instruments, and col- 
lective bargaining agreements. 

To select the case study districts, we considered demographic cri- 
teria, organizational criteria (e.g., degree of centralization), the 
district's primary purposes for teacher evaluation, teacher evaluation 
processes, methods and assumptions, and, after a preliminary assess- 
ment, the degree of implementation of the system. We finally selected 
four school districts r*^presenting diverse teacher evaluation processes 
and organizational environments: Salt Lake City, Utah; Lake Wash- 
ington, Washington; Greenwich, Connecticut; and Toledo, Ohio. 

Before visiting each school district, we reviewed the documentation 
pertaining to school district personnel and teacher evaluation policies. 
We then spent a week in each district interviewing the superintendent, 
the director of personnel, most senior administrators in the central 
office, and other central office staff concerned /ith teacher evaluation. 
We also interviewed officers and executives of the local teachers' orga- 
nizations, school board members, parent and community representa- 
tives, and knowledgeable reporters from the local media. 

In each school district, we visited six schools of varying grade levels, 
size, and neighborhood type. At each school, we interviewed the prin- 
. cipal, other specialized (differentiated staff) personnel, and at least six 
teachers, including the teachers' organization building representative. 

From central administrators, we sought an understanding of the po- 
litical arid organizational contexts, the origin of and motivation for the 
particular teacher evaluation process in use, the formal description of 
policy, and the uses which results are put. From principals, we 
,^ought an understanding of how the process is implemented and how it 
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affects their job and their ability to attain instructional and other 
school goals. 

From teachers, we sought an understanding of how teacher evalua- 
tion affects the day-to-day life of the school and the quality of their 
instruction, and how they perceive the nature of teaching work in their 
district. From teachers' organization officials, we sought an under- 
standing of how teacher evaluation affects management-labor relations 
and how they perceive the evaluation process with respect to thorough- 
ness, fairness, reliability, and validity. 

From community-based representatives, we sought an understanding 
of community perceptions of teacher quality and the teacher evaluation 
process. In all cases, we sought to ascertain general and role-specific 
perceptions of the teacher evaluation system and its function in 
improving the overall quality of instruction in the district. 

Section II reviews teacher evaluation procedures in the light of vari- 
ous conceptions of teaching; it then provides an overview of teacher 
evaluation practices in 32 school districts. Section III summarizes the 
findings of the four case studies, analyzes the similarities and differ- 
ences in the approach of the four districts to teacher evaluation, and 
describes what makes these approaches work.^ Section IV assesses 
teacher evaluation processes as to their reliability, validity, and utility. 
Section V sets forth our conclusions and recommendations for the 
design and implementation of teacher evaluation processes that will 
work. 



^These case studies are presented in greater detail in Arthur E. Wise, Linda Darling- 
Hammond, Milbrey W. McLaughlin, and Harriet T. Bernstein, Case Studies f )r Teacher 
Evaluation: A Study of Effective Practices, The Rand Corporation, N-2133-NIE, June 
1984. 




n, A PRELIMINARY LOOK AT TEACHER 

EVALUATION 



This section lays the groundwork for the case studies that provided 
the main data for the report. We review, first, the theory that 
informed the study and, second, the findings of a survey of teacher 
evaluation practices in 32 districts. 



CONCEPTUAL FRAMEWORK 

We present here a conceptual framework for the study of teache^r 
evaluation in the context of school organizations.^ We examine the dif- 
ferent conceptions of teaching and school organization that underlie 
teacher evaluation to determine whether teacher evaluation practices 
achieve the purpcises for which they are intended. 

Much existing literature on teacher evaluation examines instruments 
and techniques for evaluation without reference to their theoretical 
underpinnings or to the organizational contexts in which they are to be 
used. Without such reference, potential users— for example, school dis- 
trict administrators— cannot easily assess whether a particular 
approach will suit their purposes, conceptions of education, or organi- 
zational characteristics. Nor can they predict the effectiveness of the 
approach in achieving its purposes or its other likely outcomes. With 
theory, knowledge gained from other districts' experiences, and 
knowledge of their own districts, potential users can make informed 
estimates of probable local effectiveness and effects. 

Teacher evaluation, if it is to work, must satisfy competing individ- 
ual and organizational needs. It must balance the centralization and 
standardization ueeded for personnel decisions against the flexibility 
and responsiveness needed for helping teachers to improve. To make 
teacher evaluation work, districts must achieve this balance. 

Theoretical Conceptions of Teaching 

A teacher evaluation system must define the teaching task and pro- 
vide a mechanism for judging the teacher. Here we look at teaching as 
labor, craft, profession, and art. These four ways of viewing teaching 
by revealing the assumptions that lie behind different techniques for 

^See also Darlmg-Hammond et al. (1983). 



evaluating teachers provide a theoretical framework for analy-^ing 
teacher evaluation. 

Under the conception of teaching as labor, teaching activities are 
"rationally planned, programmatically organized, and routinized in the 
form of standard operating procv^dures" by administrators (Mitchell 
and Kerchner, 1983, p. 35). The teacher is responsible for implement- 
ing the instructional program in the prescribed manner and for adher- 
ing to the specified routines and procedures. 

The evaluation system of teaching as labor involves direct inspection 
of the teacher's work— monitoring lesson plans, classroom performance, 
and performance results; the school administrator is seen as the. 
teacher's supervisor. This view of teaching assumes that effective prac- 
tices can be concretely determined and specified and that adherence to 
these practices produces the desired results. 

Under the conception of teaching as a craft, teaching requires a 
repertoire of specialized techniques. Knowledge of these techniques 
also includes knowledge of generalized rules for their application. Once 
the teaching assignment has been made, the teacher is expected to 
carry it out without detailed instructions or close supervision. 

When teaching is considered a craft, evaluation is indirect and 
involves ascertaining that the teacher has the requisite skills. The 
school administrator is seen as a manager who holds teachers to gen- 
eral performance standards. This view of teaching assumes that gen- 
eral rules for applying specific techniques can be developed and that 
proper use of the rules combined with knowledge of the techniques will 
produce the desired outcomes. 

Under the conception of teaching as a profession, teaching requires 
not only a repertoire of specialized techniques but also the exercise of 
judgment about when those techniques should be applied (Shavelson 
and Stern, 1981). To exercise sound professional judgment, the teacher 
must master a body of theoretical knowledge as well as a range of tech- 
niques. Broudy (1956) distinguishes between craft and profession in 
this way: "We ask the professional to diagnose difficulties, appraise 
solutions, and to choose among them. We ask him to take total 
responsibility for both strategy and tactics. . . . From the craftsman, by 
contrast, we expect a standard diagnosis, correct performance of pro- 
cedures, and nothing else" (p. 182). 

Standards for evaluating professionals are developed by peers, and 
evaluation focuses on the degree to which teachers solve professional 
problems competently; the school administrator is seen as an adminis- 
trator who ensures that teachers have the resources necessary to carry 
out their work. This view of teaching assumes that standards of pro- 
fessional knowledge and practice can be developed and assessed and 
that their enforcement will ensure competent teaching. 
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Under the conception of teaching as an art, teaching techniques and 
their application may be novel, unconventional, or unpredictable. This 
does not say that techniques or standards of practice are ignored; it 
says, rather, that their form and use are personalized and not standard- 
ized. 

As Gage (1978) explains, the teaching art involves "a process thai 
calls for intuition, creativity, improvisation, and expressiveness— a pro- 
cess that leaves room for departures from what is implied by rules, for- 
mulas, and algorithms" (p, 15). He argues that teaching uses science 
but is not itself a science because the teaching environment is not 
predictable. In this view, the teacher must draw upon not only a body 
of professional knowledge and skill, but also a set of personal resources 
that are uniquely defined and expressed by the personality of the 
teacher and his or her individual and collective interactions with stu- 
dents. 

Because teaching viewed as an art encompasses elements of personal 
insight (as well as theoretically grounded professional insight), the 
teacher as an artist exercises considerable autonomy in the perfor- 
mance of his or her work. Evaluation involves both self-assessment 
and critical assessment by others. Such evaluation entails "the study 
of holistic qualities rather than analytically derived quantities, the use 
of 'inside' rather than externally objective points of view" (Gage, 1978, 
p. 15). It relies on judgmental ("high-inference") rather than countable 
("low-inference") variables, on assessment of patterns of events rather 
than counts of specific, discrete behaviors (Eisner, 1978; Gage, 1978). 

In the view of teaching as an art, the school administrator is seen as 
a leader who encourages the teacher's efforts. The view assumes that 
teaching patterns (i.e., holistic qualities of a teacher's approach) can be 
recognized and assessed by using both internal and external i*eferents 
of validity. 

Obviously, these four conceptions of teaching represent ideals that 
do not exist in pure form in the real world. In fact, various com- 
ponents of a teacher's work embody different ideal types (e.g., motivat- 
ing students, performing hall duty, presenting factual information, 
establishing and maintaining classroom relationships). Nonetheless, 
the conceptions of teaching signal different definitions of success in a 
teacher evaluation system. 

The disparity implicit in views of teacher evaluation cannot be 
ignored. McNeil and Popham (1973), for example, make a strong case 
for evaluating teachers by their contribution to the performance of stu- 
dents, as measured by standardized test scores, rather than by the use 
of teacher process criteria. Millman (1981) also argues that "criteria 
and techniques for the fair use of student achievement in both the 
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formative and summative roles of teacher evaluation can be devised," 
This view presupposes that st-.'d2nts' learning as measured by their test 
performance is a direct function of teaching performance and it mea- 
sures a teacher's worth in terms of the product or output of his work. 
Thus, it envisions teaching as labor and the student as raw material. 

The vast majority (89 percent) of teachers, however, do not consider 
scores on standardized achievement tests a valid measure of teacher 
effectiveness (National Education Association, 1979), The views of 
most teachers are based on two notions: First, test scores are limited 
measures of student outcomes; second, other factors or dynamics of the 
teaching and learning process are at least as important in determining 
learning outcomes as the teacher's performance. These other factors 
encompass school and home conditions not under the teacher's control 
and the unpredictable elements inherent in human interaction that 
give rise to a conception of teaching as profession or art. 

Conceptions of Teaching in Teaching Research 

Although the various conceptions of teaching differ along several 
dimensions, one can usefully view them as incorporating increasing 
ambiguity or complexity with regard to the performance of teaching 
tasks as one moves from labor at one extreme to art at the other. The 
role of the teaching environment in determining teacher behavior also 
increases in importance as one moves from labor to art. The more 
variable or unpredictable one considers the teaching environment, the 
more one is impelled to conceive of teaching as a profession or art. 

Gage (1978) describes how the elements of predictability and 
environmental control differentiate teaching as a science from teaching 
as an art. Teaching as a science, he observes, "implies that good teach- 
ing will some day be attainable by closely following rigoror* laws that 
yield high predictability and control'* (p. 17). He goes on to say, how- 
ever, that using science to achieve practical ends requires aitistry — the 
use of judgment, intuition, and insight in handling the unpredicted, 
knowledge of when to apply which laws and generalizations and when 
not to, the ability to make clinical assessments of how multiple vari- 
ables affect the solution of a problem. 

Research on teaching parallels these conceptions of teaching in the 
degree to which predictability and environmental controls are assumed 
or even considered in the design and goals of the research. Some 
efforts to link specific teacher characteristics or teaching behaviors to 
student outcomes have sought context-free generalizations about what 
leads to or constitutes effective teaching. 
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This line of research strongly suggests that what teachers do in the 
classroom does affect students. However, assertions that discrete sets 
of behaviors consistently lead to increased student performance (e.g., 
Medley, 1979; Rosenshine and Furst, 1971; Stallings, 1977) have been 
countered by inconsistent and often contradictory findings that under- 
mine faith in the outcomes of simple process-product research (e.g., 
Doyle, 1978; Dunkin and Biddle, 1974; Shavelson and Dempsey- 
Atwood, 1976). 

Researchers have found that effective teaching behaviors vary for 
students of different socioeconomic, mental, and psychological charac- 
teristics (e.g., Brophy and Evertson, 1974, 1977; Cronbach and^ Snow, 
1977; Peterson, 1976) and for different grade levels aiid subject areas 
(Gage, 1978; McDonald and Elias," 1976), Furthermore, interaction 
effects that may be identified in teaching research are not confined to 
easily translatable two- or even three-way interactions. This condition 
severely constrains their generalizability for establishing rules of prac- 
tice (Knapp, 1982; Shavelson, 1973; and Cfonljach, 1975). 

Teaching behaviors that have sometimes proved effective when used 
in moderation can produce ^significant and negative results when 
overused (Peterson and Kauchak, 1982; Soar, 1972), or when applied in 
the wrong circumstances (see, e.g,, Coker, Medley, and Soar, 1980; 
McDonald and Elias, 1976). This kind of finding discourages the 
development of rules for teaching behaviors that can be applied. gen- 
erally. ' . . 

A more problematic finding is that the effectiveness of differing 
teaching behaviors depends on th^-^goals of instruction. InBtructional 
acts that seem to increase achievement on basic skills tests and factual 
examinations in many casesUiff^r distinctly from those that seem to 
increase complex cognitive Me^rning, problem-solving ability, and 
creativity (McKeachie and Kblik, 1975; Peterson, 1979; Soar, 1977; 
Soar and Soar, 1976). \ 

We consider this finding related to goals problematic because if 
markedly different teaching behayiors lead to divergent results that can 
be deemed equally desirable, ^ne cannot identify a single, unidimep- 
sional construct called effectiro. teaching, much less delimit its com- 
-13onent-parts.-One^can9 at-best, pursue alternative. models of effective 
teaching, making explicit the ^als underlying each. <- 

Clearly, the desi^ of teachfer evaluation systems depends critically 
on educational goals; as conceptions of goals vary from unidimensional 
to multidimensi^jmirso conceptions of appropriate teaching activities 
vary from easily prescribed to more complex teaching acts resting on 
the application of teacher judgment. In short, as one ascribes different 
degrees of generalizability to effective teaching behaviors and different 
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weights to context-specific variables, one implicitly embodies different 
conceptioiis of teaching. The more complex and variable one considers 

\the educational environment, the more one relies on teacher judgment 
to guide the activities of classroom life and the less one relies oh gen- 

jeralized rules for teacher behavior. 

Purposes of Teacher Evaluation 

As indicated in Fig. 1, teacher evaluation may serve four basic pur- 
poses. The matrix artificially represents these purposes and levels of 
decisionmaking as distinct. In fact, teacher evaluation may apply to 
small or large groups of teachers (rather than simply individuals or 
whole schools) and may represent degrees of combined improvement 
and accountability concerns (as when promotion decisions are link<?d to 
impro veme nt "efforts) . 

Although many teacher evaluation systems are nominally intended 
to accomplish all four of these purposes, different processes and 
methods may better suit one or another of these objectives. In particu- 
lar, improvement and accountability may require different standards of 
adequacy and evidence. Individual or organizational concerns also may 
demand different processes (for example, bottom-up or top-dov^n 
approaches to change, or unstandardized or standardized remedies for 
problems). 



Purpose 
Level ^^Xs^^ 


Improvement 


. ^ 

Accountability \ 


Individual 


Individual staff 
development 


Individual personnel 
decisions (e.g., job 
status) 


Organizational 


School 
improvement 


School status 
decisions (e.g., 
accreditation) 



Fig. 1 — Basic purposes of teacher evaluation 
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Fen8termacher and Berliner (liU '^) illuminate these differences with 
respect to staff development (our improvement dimension), although 
their observations are applicable to accjuntability purposes as well. 
Their definition of staff development encompasses four scales alorig 
which approaches may differ: 

Staff development activities may be internally proposed of externally 
imposed, in order to effect compliance, remediate deficiencies, or 
enrich the knowledge and skills of individual teachers or groups of 
teac,l, .>s, who may or may not have a choice to participate in these 
activities (p. 5). 

According to Fenstermacher and Berliner, as participant roles and 
organizational levels become more differentiated, the profile of a staff 
development activity^ tends to shift from internal to external initiation, 
from an enrichment to a compliance focus, from participation by indi- 
viduals or small groups to standardized programs for large groups, and 
from voluntary to involuntary participation. 

For purposes of accountability, teacher evaluation processes must be 
capable of yielding fairly objective, standardized, and externally defen- 
sible information about teacher performance. For improvement objec- 
tives, evaluation processes must yield rich, descriptive information that 
illuminates sources of difficulty as well as viable courses for change. 
To inform organizational decisions, teacher evaluation methods must 
be hierarchically administered and controlled to ensure credibility and 
uniformity. To assist decisionmaking about individuals, evaluation 
jnethods must consider the context in which individual performance 
occurs to ensure appropriateness and sufficiency of data. 

Although these purposes and the approaches most compatible with 
them are not necessarily mutually exclusive, an emphasis on one may 
tend to limit the pursuit of another. Similarly, while multiple methods 
may—and, many argue, should— be' used for evaluating teachers, school 
systems must consider the purposes th^t each serves to ensure that 
teacher evaluation goals and processes do not conflict. In short, they 
must recognize potential conflicts before adopting a teacher evaluation 
system. 

Changing Teacher Behavior 

The primary goal of teacher evaluation is the improvement of indi- 
vidual and collective teaching performance in schools. To improve a 
teacher's performance, the school system must enlist the teacher's 
cooperation, motivate him (or her), and guide him through steps 
needed for improvement to occur. For the individual, improvement 
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relies on the development of two important conditions: (1) the 
knowledge that a course of action is the correct one and (2) a sense of 
empowerment or efficacy, that is, a perception that pursuing a given 
course of action is both worthwhile and possible. 

Most teacher evaluation processes identify effective teaching without 
addressing the question of how to change teaching behavior. The ini- 
tiators of such processes assume that once they have discovered what 
ought to be done, teachers will naturally know what to do and will do 
it. 

Fenkermacher (1978) argues, however, that ''if our purpose and 
intent are to change the practices of those who teach, it is necessary to 
come to grips with the subjectively reasonable beliefs of teachers" (p. 
174). This means creating internally verifiable knowledge rather than 
imposing rules of behavior. It assumes, first, that teachers are rational 
professionals who make ju uents and carry out decisions in an uncer- 
tain, complex environment and, second, that teachers' behavior is 
guided by their thoughts, judgments, and <lecisions (Shavelson and 
Stern, 1981). Thus, behavior change requires transformation of belief 
structures and knowledge in a manner that allows for situation-specific 
applications. 

A sense of efficacy is an important element of the link between 
knowledge and behavior. This sense affects performance by generating 
coping behavior, self-regulation of refractory behavior, perseverance, 
responses to failure, growth of intrinsic interest and motivation, 
achievement strivings, and career pursuits (Bandura, 1982; Bandura 
and Schunk, 1981; Bandura et al, 1980; and DiClemente, 1931). A 
sense of efficacy is not an entirely internal construct; it requires a 
responsive environment that allows for and rewards perforii^anee 
attainment (Bandura, 1982, p. 140). However, the individual must 
value the goals and the goals must challenge the individual, or the task 
performance will be devalued (Lewin, 1938; Lewin et al., 1944). 

A review by Fuller et al, (1982) of the research on individual efficacy 
in the context of organizations suggests that, with respect to teacher 
evaluation, increased performance and organizational efficacy for 
teachers will result from: 

• Convergence between teachers and sdministrators in accepting 
the goals and means fox task perfovmance (Ouchi, 1980) 

• . Higher levels of per&onaiised iiilef action and resourcs ezchangs 
between teachers aiid administrators (Talbert, 1980) 

• Lower prescriptiveness of work tasks (Anderson, 1973) 

• Teachers' perceptions that evaluation is soundly based and that 
evaluation is linked to rewards or sanctions 
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Teacher input into evaluation criteria, along with diversity of 
evaluation criteria (Pfeffer et al., 1976; Rosenholtz and Wilson, 
1980). 

These tmdingfi agree with those of Natriello and Dornbusch 
(1980-1981) on deiorminants of teachers' satisfaction with teacher 
evaluation systems. They found teacher satisfaction strongly related to 
(a) perceptions that all evaluators share the same criteria for evalua- 
tion; (b) more frequent samplings of teacher performance; (c) more fre- 
quent communication and feedback; (d) teachers' ability to affect the 
criteria for evaluation. Furthermore, frequency of negative feedback 
did not cause dissatisfaction, but infrequency of evaluation did; 

Teacher satisfaction with evaluation, then, seems to rest on the per- 
ception that evaluation is soundly based, that is, that the teacher has 
some control over both task performance and its assessment. This per- 
ception influences the teacher's sense of performance efficacy (Fuller et 
al., 1982, p. 24).' 

Finally, opportunities for self-assessment and for -reference to per- 
sonal standards of performance strongly influence the sense of efficacy 
and motivation. The teacher evaluation literature has begun to recog- 
nize the importance of both self-assessment (Bodine, 1973; Bushman, 
1974; Riley and Schaffer, 1979) and allowing teacher input into the 
determination of evaluation criteria and standards (Knapp, 1982). As 
Bandura (1982) observes: 

In social learning theory an important cognitively based source of 
motivation operates through the intervening processes of goal setting 
and self-evaluative reactions. This form of self-motivation, which 
involves internal comparison processes, requires personal standards 
against which to evaluate performance (p. 134). 



Teacher Evaluation in the Organizational Context 

Recent policy analysis and program evaluation research to explain 
policy effects recognizes the importance of organizational considera- 
tions (Sabatier and Mazmanian, 1979; Sproull, 1979; Wildavsky, 1980). 
Formal policies and procedures, the research has found, may constrain, 
but do not construct, the final outcomes of any institutional endeavor. 

The local implementation process and organizational charac- 
teristics—such HR institutional climate, organizational structures, and 
incentives, local political processes, expertise, and leadership style- 
determine the ultimate success of a policy in achieving its intended 
effects (Berman and McLaughlin, 1978; Mann, 1978; Weatherley and 
Lipsky, 1977). Effective' change requires a process of mutual adapta- 
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tion in which agents at all levels can shape policies "^o meet their 
needs— one in which the convergence of internal and external factors 
transforms both the participants and the policy. 

The implementation of any school policy, including a teacher evalua- 
tion policy, represents a continuous interplay among diverse policy 
goals, established rules and procedures (concerning both the policy in 
question and other aspects of the school's operation), intergroup bar- 
gaining and value choices, and the local institutional context. The po- 
litical climate of the school system, the relationship of the teachers' 
organization to district management, the nature of other educational 
poHcies and operating programs in the district, and the size and struc- 
ture of the system and its bureaucracy all influence teacher evaluation 
procedures. 

SURVEY OF PRACTICES IN 32 SCHOOL DISTRICTS 

As a first step in our empirical research, we conducted an explora- 
tory assessment of 32 reputedly well-deveioped teacher evaluation sys- 
tems. The following subsections describe the characteristics of the 
school districts, the similarities and differences in their teacher evalua- 
tion activities, some major problems in teacher evaluation, and some 
major effects of evaluation. 

District Characteristics 

We surveyed local educational aeancies (LEAs) in a broad range of 
rural and suburban districts, medium-size cijties, and large urban-'areas. 
Minority enrollment in these LEAs ranged from 1 percent to 75 per- 
cent. The proportion of Chapter I eligible students^ varied from less 
than 1 percent to over 40 percent. District wealth as indicated by per 
pupil expenditure varied from $1400 to more than $3000. 

Despite this substantial contextual variety, the sample LEAs had the 
following common features: 

• All had a relatively mature teaching force— the average was 14 
years of service. 

o All but three faced declining student enrollments; all faced 
moderate to severe financial retrenchment. As a result, most 
had been required to dismiss teabhers. 

• Teachers were organized in all but tv/o; 25 of the 30 organized 
LEAs had a collective bargaining agreement with their teachers, 
and 20 agreements included teacher ^valuation. These agree- 
ments typically focused on procedural rather than substantive 
issues. 
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Program Characteristics 

The district teacher evaluation practices that we examined differed 
substantially in detail. These differences appeared primarily in local 
implementation choices— how to put a particular procedure into prac- 
tice. District practices were remarkably similar in broad outline, 
however— indeed, much more so than we had expected, given the state 
of the art reported in the literature. 

Similarities. Each of the 32 districts had had a teacher evaluation 
scheme in place prior to the present practice. In Bome diotricts, the 
former system was simply a paper activity— a routine task that occu- 
pied little time or attention. In the majority of districts, however, 
antecedent evaluation activities represented a serious concern on the 
part of LEA administrators and boards of education. 

District officials and teachers had been dissatisfied with the way 
evaluation was conducted and the type of information produced. In 
particular, local officials criticized their earlier, typically narrative 
evaluation systems as too formal, too subjective, inconsistent, and inef- 
ficient. They sought to remedy these deficiencies with the present 
evaluation practices. 

Interestingly, teachers strongly advocated a revised and a more 
standardized evaluation effort. In their view, narrative evaluation pro- 
vided insufficient information about the standards and criteria against 
which teachers were evaluated and resulted in inconsistent ratings 
among schools— ratings that depended on the judgment of the building 
principal rather than uniform district objectives for teacher perfor- 
mance. 

Although almost all districts initiated their present evalu»^tion sys- 
tems in an effort to develop a stronger and more consistent strategy, 
state-level action played an important role in the initial development of 
teacher evaluation in a number of LEAs. Many states have guidelines 
or legislation about teacher evaluation. However, these state-level 
requirements differ markedly in specificity and authority. In New 
Mexico, for example, legislation requires only that all districts keep 
records on personnel performance. Other states, in contrast, have 
specific mandates and guidelines as to the nature, frequency, and level 
of local teacher evaluation. 

California, Connecticut, New Jersey, and Washington take a strong 
position on teacher evaluation, specifying the purpose and nature in 
some detail. Washington State goes so far as to outline the broad phi- 
losophy guiding its teacher evaluation requirements and to suggest a 
model to guide local practice. Connecticut, too, has taken a particu- 
larly active role by providing grants to support local development 
efforts. • . ' 

. 33 



17 



Local respondents in these states cited state mandates as a major 
factor in the initiation and development of their teacher evaluation 
efforts. LEA officials with strong commitment to teacher evalut^tion 
were able to build comprehensive local activities on this state author- 
ity. In particular, thanks to state action, teacher evaluation is no 
longer discretionary. 

The teacher evaluation practices that we examined shared — in addi- 
tion to common reasons for initiation — a common process of develop- 
ment. With few exceptions, well-organized committees of teachers, 
administrators, union representatives, principals, and sometimes 
parents had instituted the new systems. These committees took, on 
average, between six months and a year to develop a teacher evaluation 
process and design instruments. Some LEAs relied on outside 
consultants— in particular, Richard Manatt, George Redfern, and 
Madeline Hunter — for advice and adopted their models in part or full. 
Most districts, however, developed their own evaluation practices 
without outside assistance. 

Given the local origin of these teacher evaluation practices, they 
showed a surprising consistency in goals and criteria for evaluation. 
Our review of the literature identified four broad goals of teacher 
evaluation: personnel decisions, staff development, school improve- 
ment, and accountability (see Fig. 1, above). 

These goals differ in theory: Personnel decisions involve teacher 
placement and tenure; staff development focuses on the identification 
of areas for teacher in-service training; school improvement concen- 
trates on upgrading the quality of instruction; 'and accountability 
centers on setting and meeting LEA standards. Our conversations 
with district administrators suggested, however, that these differences 
are less apparent or meaningful in practice. 

Ask?d to identify the rnujor purpose of their teacher evaluation sys- 
tem, respondents in 12 districts specified staff development or school 
improvement purposes, 6 cited accountability, and 2 cited personnel 
decisionmaking. However, with only three exceptioa^, LEA adminis- 
trators had difficulty specifying the primary goal of teacher evaluation. 
In practice, they asserted, teacher evaluation serves all four purposes. 
The differences among systems essentially reflected the sdSinewhat dif- 
ferent weighting applied by various LEAs. 

The 32 districts also used similar criteria or categories of teacher 
competency. Although district practices differed somewhat in language 
or sequence, the majority of teacher evaluation efforts addressed five 
broad factors: 
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• Teaching procedures 

• Classroom management 

o Knowledge of subject matter 

• Personal characteristics 

• Professional responsibility. 

Likewise, at 6 general level, the 32 LEAs employed similar evalua- 
tion processes. In 28 districts, the formal process called for a 
preevaluation conference between the teacher and evaluator in which 
evaluation goals were clarified and the evaluation process was specified. 
All districts used classroom observation to evaluate teacher performance 
and schedv.l8d a postevaluation conference to discuas evaluator findings 
and reactions. In addition, 28 LEAs concluded this postevaluation 
conference with a written agreement between teacher and evaluator 
about a plan of action based on findings. In 26 , districts, this plan of 
action included formal district follow-up procedures. 

Districts also did not do many of the same things as part of teacher 
evaluation. Twenty eschewed self-evaluation as part of their pro- 
cedures; 24 made no provision for peer review. Only one district haa a 
system 'built on established teacher competericies. Only seven con- 
sidered student achievement scores in the evaluation process, but noted 
that they did so more to indicate a problem than to assess teacher per- 
formance. ., . , . 

District teachei evaluation practices also showed similarities in 
terms of the locus of responsibility and so.urce of funding. Responsibil- 
ity for teacher evaluation, with few exceptions, was located in either 

the personnel division or staff development division. 

■ Interestingly, the location of teacher evaluation responsibilities in 
one or the other division did not appear to signal substantive differ- 
ences in LEA philosophy or approach to teacher evaluation. For exam- 
ple, some systems that emphasized teacher development and clinical 
supervision gave the personnel division responsibility for the program. 
In contrast, several districts that stressed teacher outcomes and 
categorized program goals in terms of accountability assigned responsi- 
bility to staff-development or support-service units. 

With only two exceptions;^ financial support for teacher evaluation 
came from general administrative funds; it was not a line item in dis- 
trict budgets. Respondents saw teacher evaluation as part of an 
administrator's job and thus not requiring special funding. A number 
of respondents said that their teacher evaluation system "doesn't cost 
anything." However, as we discuss below, this approach may 
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comproiniso teacher evaluation when IhoHO roHponsibiliticH arc added to 
the other duties of central ol'fice personnel and building principals. 

Finally, in 25 districts, the building principal evaluated the teachers 
in his or her school. In only a Tew districts did principals share this 
function with other district administrators, such as instructional super- 
visors. The number of evaluations required of n principal can be quite 
large, depending on the size of the school and the LEA's schedule for 
teacher evaluation. On average, however, respondents indicated that 
principals are responsible for comprehensively evaluating approxi- 
mately 15 to 20 teachers each year. Preevaluation conferences, multiple 
classroom observations, and postevaluation briefings thus combine to 
make teacher evaluation a time-consuming chore for most building 
administrnf ors. 

In summary, at the broad levels of purpose, criteria, procedure, and 
structure, the teacher evaluation practices that we examined showed 
remarkable similarities. These similarities, however, masked substan- 
tive and significant differences in the ways teachers were actually 
evaluated. As we will show, these differences in implementation pro- 
duced variations in the ways in which system participants perceived 
the evaluation effort and the extent to which it served its stated pur- 
pose. , 

Differences. LEA teacher evaluation practices differed in the type 
and amount of training given evaluators, the frequency of evaluation, 
instrumentation, level of integration with ongoing district activities, 
and the extent to which administrator evaluation complements teacher 
assessment. 

Although only three respondents said that the district provided little 
or no training for evaluators, the significantly different level of training 
offered in our sample LEAs was bound to influence the confidence and 
competence of evaluators. Evaluator training ranged from low and 
infrequent to high and intensive. A district at the low end, for exam- 
ple, provided no formal training; instead, the LEA administrator 
responsible for teacher evaluation visited each school to talk with the 
principal about evaluation activities; 

At the high end, some districts scheduled regular training sessions 
throughout the year, provided intensive in-service training in evalua- 
tion before school started, and brought teacher evaluation experts into 
the district (or provided funds for district personnel to travel to confer- 
ences or other districts). One district sponsored a Principals' Institute 
as part of its Instructional Improvement Program for Educational 
Leaders; teacher evaluation was a major institute topic. 



The nuinl>or of ovaliiations that a district rcuiuired varitMl widely. 
For nontoiiurcd teachers, evaluations raiHjed from a low of once a year 
to a hij^h of twice? a month durini^' the first year of teaching. F()r 
tenunid teacbi)rs, some districts evaluated only when a teachor\H con- 
tract came up for renewal (every three or four years); other districts 
evaluated once a year, with a minimum of two classroom observations. 

The instruments used to evaluate teacher performance rtinged from 
those using only a narrative form to those using a straightforward 
pass/fail measure of specified criteria. Most evaluation instruments, 
falling somewhere in between, used some form of scaling device. These 
instruments varied in number of points on the scale (3/ 5, or 7). the 
extent to whibh they required a(!ditional evaluator comment or justifi- 
cation for a rating, and whether they included teacher response to 
evaluator comments. Together, these differences in the frequency and 
nature of teacher evaluation meant that (Jistrict« staff received signifi- 
cantly different types and amounts of information about teacher per- 
formance. 

The local teacher evaluation practices that we examined also dif- 
fered in the extent to which they were integrated into district activities 
or operated in relative isolation. For example, they differed in the 
degree to which adherence to district curriculum guides was an evalua- 
tion factor. For some districts in which it was not a factor, the disre- 
gard of curriculum guides in the evaluation process reflected the fact 
that, in the opinion of respondents, curriculum guides were under- 
developed. 

Given that curriculum guides were fairly well developed, however, 
this diversity suggested variations in the district coordination of 
instructional management and evaluation. That is, LEAs that did not 
incorporate curriculum guides into teacher evaluation were unlikely to 
view teacher evaluation as a way to direct instructional practices. In 
contrast, districts that tied teacher evaluation to curriculum guides 
tended to see evaluation and instructional development as a piece: the 
goals specified in curriculum guides were expected to be addressed in 
the classroom. 

Substantive relationships between' staff development and teacher 
evaluation also differed substantially. Only five districts in our. sample 
reported that teacher evaluation had no influence on staff development 
activities. 

In only a few districts, however, were the results of annual teacher 
evaluations explicitly fed into the planning and design of district in- 
service education activities., In one of these districts, for example, the 
positions of personnel director and staff development coordinator had 
been combined- into a single position to ensure close coordination 



botween cluHHroom i)nicticeH and LK\ in-Horvico proKnunH. DintrictH 
u«inK a form of Madeline Huntor'H elinical Hupervimon model alHO 
maintained a relatively cIoho rolationMhip l)etweon evaluation criteria 
/»;ul Htaff development practicen. 

For mo8i diHtricis, the rolationnhip l)etween tc(u:her evaluation and 
Htaff development waH Icmh clear and certainly lesH formal. Indeed, we 
inferred from rcHpondonts' commontH that where a relationflhip oxiflted 
between thcHC two LEA activities, it wae temporary and incidental. 

Instead of the routinizcd and explicit coordination of teacher evalua- 
tion and staff development reported in a few districts, in most LEAs 
these activities appeared to function more or less independently of each 
other. Tea<:her evaluation Hoemed more nearly a "categorical" activity. 
On the face of it, this general lack of integration among teaelier evalua- 
tion, ataff development, end district curriculum guides raised questions 
about the effectiveness with which teacher evaluation activities could 
address such purposes as staff'development and school improvement. 

Finally, LEAs varied in the extent to which administrator evaluation 
operated in the same scope and depth as teacher evaluation. Respon- 
dents in 26 districts reported that annual administrator evaluations 
were required, often by state mandate. However, administrator evalua- 
tion practices were, for most of our sample, significantly less well 
developed than those involving teachers. 

Typically, administrator evaluation consisted of a yearly narrative 
prepared by an administrator's superior. In only a few districts had 
administrator evaluation received serious attention and concern; these 
LEAs were reviewing teacher and administrator evaluation and were 
planning to develop a new system. In the remaining LEAs, however, 
the lack of attention to administrator evaluation suggested that this 
area was seen as separate and distinct from teacher evaluation prac- 
tices. 

In summary, the teacher evaluation practices that we examined dif- 
fered substantially. Although these practices seemed similar in broad 
outline, they diverged as local implementation choices were made. Our 
preliminary assessment of local teacher evaluation activities led us to 
conclude that LEAs do not agree on what constitutes the best practice 
with regard to instrumentation, frequency of evaluation, the role of the 
teacher in the process; or how the information could or should inform 
other district activities. In our view, this lack of consensus signals 
more than differences in notions of practices appropriate to a^particu- 
lar setting. 

These differences in practices, we believe, indicate that teacher 
evaluation presently is an underconceptualized and underdeveloped 
activity. Although almost all districts that we investigated had one or 



moTv particularly nironK roatiircja. in only a low did {vm Uvr evaluation 
practicoH r(^[)rt;H(;nt a w(dl-d(iV(;lop(Ml systt^m in which rchitioiinhipH 
amouK varioim ovaliiation activiUen wore thou(,'ht throti^,di jnid n^lation- 
HhipB hotweon toachor ovaliiation and other dintrict practicciH wen? 
CHtahliahed. 

Major ProblowiH of 1 oachor Mvaliiutioa 

Despite difforencoH in level of development uiul diveraity of local 
implementation choices, the major prohlemH .nHSOciated with teacher 
evaluation practices were similar in the 32 districts surveyed. Indeed, 
agreement among reapond6ntH ahout difllculticH encountered in teacher 
evaluation undorscoroH our concKuiion that important (:on(:ej)tnal work 
remains to be done in this area. 

Two important problem areas may be inferred from rcHpondont per- 
ceptions of teacher evaluation practices. Almost all respondents, even 
those who believed that principals supported the teacher evaluation 
program, felt that principals lacked sufficient resolve and competence to 
evaluate accurately. They frequently cited role conflict as the reason. 

Central office respondents believed that the conflict between the 
principal as instructional leader and evaluator has not been settled. 
Noting that collegia! relationships lead many .principals to want to be 
"good guys,*' many respondents felt that princli>al' evaluations were 
upwardly biased. Principals' disinclination to bo tough makes the early 
identification of problem teachers difficult and masks important varia- 
tions in teacher performance. ^ 

In addition, most respondents said that principals considered, evalua- 
tion a necessary evil or a time-consuming chore. Since in most dis- 
tricts teacher evaluation has been edded to a principal's responsibilities 
without taking other functions away or providing additidnal assistance, 
principals* perceptions of evaluation as a burden are probably correct. ^ 
Teacher resistance or apathy was the second most frequently cited 
problem. Teachers reportedly fully supported their evaluation program , 
in less than half of our sample districts. Some teacher anxiety almost 
certainly stems from evaluation itself. However, by respondent report, 
a substantial amount of teacher discomfort results from a tliird prob-. 
lem area: lack of uniformity and consistency within a school system. 
Even though evaluation inatrurhents have become more standardizefl, 
in many districts teachers believe that the present system still depends 
too tnuch on the judgment or predisposition of the principal and leads 
to different ratings for similar teacher practices in different school's. 

V^hile inconsistency in evaluation judgments stems in part from 
instrumentation, it also reflects another problem area: inadequate 
training for evaluators. Many h^k respondents felt that staff resporisi- 
ble for evaluation did not receive enough training and th^t the training 



tlioy rcrrivcd provided iniiulliriciit Kuidaiict^ in iha conduct of (ivalna- 
tioii. 

UcMi)(>nd(»ii(H n\m n^portcd difficidty in (wo other imuxH: (ho ovidna 
lion of Mcrondary Hchool Htaff and (lio (ivahialion of Hp(MM<difi(.H. Both 
Imhucm involve the difficulty of <i Mcnrralitit nHilmitor (i.e., the prin(M[)al) 
(issvssifty, the compctvnvv of a specialist trachcr (i.(\, Mccondarydovcd- 
rhcnuHtry, nia(lu'n»a(i('H, h»nf.;uaK(s and LEA art upociaUHtH, phyMical- 
('(hiration and vocationahochiCMtion InHtnictorM, and tlu) Hko). Honu) 
(UhtrictH havo Kou^jht to solve the problem at the flecoiidary level by 
introducing a form of peer review. But moat reHpondontH felt that the 
ii. ability of their flyfltem to recognize differences in elementary, second- 
ary, and HpecialiHt teacher performance remained an important, 
O'.ircMolvrd iMHilc. 

Major EffectH of Teacher Evaluation 

A number of respondents shared the view of the LEA administrator 
who said: "Teacher evaluation is one of the most powerful ways to 
impact instruction." The power of teacher evaluation as an improve- 
ment strategy is evident in the positive outcomes that respondents 
attributed to their evaluation system, even when they believed that the 
system needed revision. 

Respondents consistently reported two results of teacher evaluation: 
improved teachcr-admirmtrator communication and increased teacher 
awareness of instructional goals and classroom practices. Even in the 
less-developed teacher evaluation systems, the process of evaluation — 
preobservation conferences, observation, and postevaluation meet- 
ings—substantially improved teacher-principal relationships and sharp- 
ened teachers' awareness of the goals and process of instruction. 

Improved commimication was mentioned frequently. One respon- 
dent said that teachers tell him: "This is the first time I have gotten 
meaningful help from my principal." Another cited teacher reports 
that the school climate had improved since evaluation responsibilities 
brought principals into the classrooms regularly. Still another said: 
"Teacher evaluation has brought about a sense of team effort at the 
building level that did not exist before. More teachers and principals 
are beginning to establish common goals." 

An evaluation program reportedly gives teachers an increased sense 
of pride and professionalism and motivates them to improve classroom 
:>ractices. Moreover, teachers take pride in their own support of 
evaluation and the professionalism that their support of evaluation 
implies. As one superintendent put it: "Our teacher evaluation pro- 
gram has made teachers prouder of their system. They are proud of 
their role in ensuring academic standards in our schools." 
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Respondents attributed part of this sense of pride and professional- 
ism to the school systems' recognition of the ^eachers' competenci 
Thoy ascribed another important part to the opportunities for feedback 
and discussions about standards of good practice that evaluation pro- 
vides. In short, teacher evaluation has eroded the traditional isolation 
of the classroom teacher. It has improved communication, and it has 
given teachers a sense of task in the loosely coupled system of school 
districts, school buildings, and classrooms. 

In most districts, the teacher evaluation system has also led to per 
sonnel actions. Although few LEAs used evaluation outcomes to ter- 
minate tenured staff, nontenured staff were dismissed on the basis of 
evaluation in most sample LEAs. Not surprisingly, LEAs located in 
states having particularly restrictive state-level legislation concerning 
termination of tenured teacher^ (e.g.. New Jersey) have undertaken 
especially thorough evaluations of beginning -teachers. However, more 
than half of our sample indicated that evalcration has played a major 
role in "counsehrig out" tienured teachers shown to be ill-suited for 

teaching. ^ • 

Other reported results of teacher evaluation include: better LEA- 
teacher union relatioiis;--iniproved classroom instruction; student 
achievement gains; more funds allocated for staff development; and 
increased public confidence in the schools. The extent to which these 
outcomes can be attributed to' teacher evaluation or, in fact, have 
occurred is discussed in our case analyses of four of these 32 districts. 

Issues for Case Study Analyses 

The substantive difference in district teacher evaluation practices 
and the problems raised by respondents suggested a number of issues 
for our case study analyses. The role of the principal in teacher 
evaluation emerges as a primary concern. In most districts, the princi- 
pal is the primary if not the sole evaluator of teacher performance. Yet 
respondents report that principals are overburdened, often inadequately 
trained, and constrained in their evaluation function by coUegial rela- 
tionships with their staff. ^ 

Practitioner concerns about the reliability and validity of teacher 
evaluations pose other central concerns. Many respondents pointed to 
insufficient differeritiatibn. among types' of teachers as a development 
problem for teacher evaluation. Do available strategies allow for indi- 
vidual school or teacher differences? 

Local respondents indicated that while their evaluation system had a 
primary goal, in reality it was expected to serve four goals: personnel 
decisions, staff development, .school improvement, an d accountabil ity^ 
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How realistic is that expectation? Can a single evaluation system 
address all four purposes equally well? Which approaches to evalua- 
tion best suit which goals? 

Finally, according to most respondents, their teacher evaluation sys- 
tem "doesn't cost anything." However, even if teacher evaluation does 
not appear in an LEA budget, costs nonetheless are associated with it. 
These costs include not only dollars, but tasks done, however superfici- 
ally, in connection with evaluation, management time devoted to 
developing, monitoring, and negotiating evaluation, teacher time away 
from classrooms or "off-task" in classrooms, and so on. 

Local practitioners must balance LEA teacher evaluation purposes, 
district resources, and traditions. Our case studies analyze the factors 
central to resolving the dilemmas underlying teacher evaluation, in par- 
ticular: 

• Divisions of authority and responsibility among teachers, prin- 
cipals; and central office administrators in the design and 
implementation of the teacher evaluation. process 

• The degree of centralization and standardization of the manage- 
ment of the process 

• Distinctions between the formal process and the process as 
implemented 

• The extent to which the process balances control and auto- 
nomy, commonality, and flexibility. 
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Ill, SUMMARY OF STUDY FINDINGS 



THE FOUR EVALUATION SYSTEMS IN REVIEW: 
DIFFERENT BUT SIMILAR -.^^ 

Salt Lake City, Lake Washington, Greenwich, and Toledo—lhe case 
study districts— approach the task of teacher evaluation in different 
ways. They emphasize different purposes for evaluation; they use dif- 
ferent methods for assessing teachers; and they assign different roles to 
teachers, principals, and central office administrators i^ the evaluation 
process. 

These evaluation systems nevertheless share implementation charac- 
teristics. These commonalities in implementation; in fact, set these 
four systems apart from less successful ones. Moreover, they suggest 
that implementation factors contributiitg to the success of these sys- 
tems may also contribute to the success cf other formal processes. 

The four teacher evaluation systems /ary with respect to the pri- • 
mary evaluators and the teachers who are evaluated. They also differ 
with respect to the major purposes of evaluation, the instruments used, 
the processes by which evaluation judgments are made, and the linkage 
between teacher evaluation and other school district activities, such as 
staff development and instructional management. Finally, districts 
represent dramatically different contexts for teacher evaluation in 
terms of student population, financial circtmistances, and political 
environment. 

Despite these differences in form, the four districts follow certain 
common practices in implementing their teacher evaluation systems. 
Specifically, they pay attention to four critical implementation factors: 

1. They provide, top-level leadership and institutional resources 
for the evaluation process. 

2. They ensure that evaluators have the necessary expertise to 
perform their task. 

3. They ensure administrator-teacher collaboration to develop a 
common understanding of evaluation goals and processes. 

4. They use an evaluation process and support systems that are 
compatible with each other and with the district's overall 
goals and organizational context. 
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Attention to these four factors— organizational commitment, evalua- 
tor competence^ collaboration, and strategic compatibility— has elevated 
evaluation from what is often a pro forma exercise to a meaningful pro- 
cess that produces useful results. Although these factors seem to be 
straightforward and self-evident requisites for effective evaluation, they 
are not easily accomplished and are usually overlooked in the pressure 
to develop and adopt the perfect checklist or set of criteria for teacher 
evaluation. 

Moreover, the districts are striving to maintain and improve the 
organizational supports and processes on which meaningful evaluation 
depends. They understand that the implementation of the evaluation: 
procees is at least as important as its form. We summarize below the 
formal aspects of the four district teacher evaluation processes and how 
they operate in an organizational context.^ 

SALT LAKE CITY: ACCOUNTABILITY 
IN A COMMUNAL CONTEXT 

The hard-nosed yet relatively informal teacher evaluation process in 
Salt Lake City occurs in a state lacking a teacher tenure law and 
state-mandated teacher evaluation. The 25,000-student population of 
Salt Lake is relatively homogeneous for an urban district, and the dom- 
inant Mormon culture emphasizes education, conformity, and coopera- 
tive endeavor. 

The concept of shared governance imdergirding the teacher evalua- 
tion process conforms to Mormon community values. Management by 
decentralized consensus among parents, teachers, and administrators 
allows widespread input into nearly all aspects of school operations, 
including the ass3ssment of teachers. Teachers are evaluated under a 
system based on communal decisionmaking with appeal to a higher 
authority. 

Of the four case study teacher evaluation systems, that of Salt Lake 
centers most expHcitly on making personnel decisions in the name of 
accountability. The remediation process to which principals may 
assign teachers judged inadequate has resulted in the removal of 37 
teachers over the past nine years and the reinstatement of nearly that 
number of successfully remediated teachers to presumably more pro- 
ductive classroom teaching. Although principals initiate the remedia- 
tion process, a four-member remediation team, composed of two 
administrators and two teachers, conducts the two- to five-month 

^At the riak of oversimplification, we use metaphors to suggest the stylistic and sub- 
stantive differences among the districts' approaches to teacher evaluation. 
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assistance and monitoring process. At the end of the remediation 
period, the principal recommends either termination or reinstatemer 

The Salt Lake teacher evaluation system relies on an annual g - 
setting exercise in which the principal and teacher confer on which s. 
tem, school, or personal goals the teacher will pursue for the comin, 
year. The system specifies neither the number of observations nor 
their duration. Observations may focus on either the adopted goals or 
a list of "teaching expectancies" included in the collective bargaining 
agreement between the school district and the Salt Lake Teachers 
Association (SLTA). 

The evaluation system does not begin to operate in a highly formal- 
ized manner unless a teacher is performing poorly. Prior to formal 
remediation, a principal may initiate informal remediation, at which 
point observed deficiencies and a specified plan of action are put in 
writing, and the teacher is given additional supervision and assistance. ' 
If informal remediation succeeds, no record of the process enters the 
teacher's personnel file. If it fails, the teacher receives formal remedia- 
tion. " 

Organizational Commitment 

The superintendent gives teacfier evaluation and remediation high 
priority. He personally redesigned and manages these two elements of 
the governance structure. 

A variety of mechanisms make a teacher's classroom performance a 
legitimate domain of interest for virtually all members of the school 
community. An "open disclosure" policy requires teachers to provide a 
written statement to parents of what they plan to do in each school 
year. Parents' involvement on school community councils allows them 
a voice in such matters as curriculum and staffing patterns. 

In addiljion, a "review-of-services" process allows anyone to raise a 
complaint] about virtually any school practice for investigation by a 
third part^. About one-third of all teachers placed on remediation in 
Salt Lake were identified through the review-of-services process. 
Because of the openness of the system, poor performance is usually 
noticed and addressed by the remediation process. 

The transfer and assignment process also draws attention to evalua- 
tion. When the superintendent negotiated an accountability system 
with the teachers' association, he traded job security for performance- 
based dismissal. Thanks to this agreement, the Skit Lake school sys- 
tem cannot lay off teachers because of declining enrollment or budget 
shortfalls; it can dismiss them only because of poor performance, if the 



remediation process fails. As a result, when positions are cut back in a 
school, some teachers— usually those who are performing poorly— are 
declared "unassigned" by the faculty School Improvement Council, and 
an assignment committee composed of teachers and administrators 
tries to find vacancies for them. Repeated lack of assignment due to 
poor performance receives scrutiny at both the school and central 
office levels and triggers the evaluation process. 

The system provides additional resources for remediation. When a 
teacher is placed on informal remediation, the principal may call on 
one of 40 teacher specialists (who are chosen for their outstanding 
teaching ability) to provide classroom assistance to the teacher. When 
formal remediation is instituted, a four-member remediation team is 
assembled. This team includes the principal, one of five learning spe- 
cialists from the central office, an SLTA representative trained to pro- 
tect the teacher's legal tights, and a. teacher with expertise in the par- 
ticular subject area or grade level. The team may hire an additional 
expert teacher from a pool of those on leave or retired if still more 
assistance seems required. 

Evaluator Competence 

Evaluation requires of those who implement it the ability to make 
both sound judgmenis about teaching quality and appropriate, concre^;e 
recommendations for improvement of teaching performance. Salt Lake 
achieves this dual evaluation function by dividing responsibility 
between principals and expert teachers. Principals are responsible for 
evaluating teachers and for instigating remediation procedures for 
those who are performing poorly. Once remediation begins, however, 
expert teachers in the appropriate teaching area assume a large portion 
of the assistance function. Salt Lake also operates a peer adviser pro- 
gram for first-year teachers in which skilled, experienced teachers 
receive small stipends and released time to help and to counsel new 
teachers. 

Collaboration 

The Salt Lake Teachers Association collaborated with the board of 
education in designing the district's teacher evaluation lysteni. In 
negotiations about the evaluation plan, the association gained a prom- 
ise of job security for its members in return for accountability-based 
remediation and dismissal procedures. The SLTA developed the list of 
"teaching expectancies" that provide the basis for evaluation decisions. 
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The remediation teams include two SLTA appointees and two 
administrators; one SLTA representative is trained to safeguard the 
legal rights of the teacher on remediation. Although, originally, the 
entire remediation team had to agree to the dismissal of a teacher who 
failed remediation, more recently the SLTA has asked the principal to 
make the final decision after conferring with the team. 

Other mechanisms in Salt Lake buttress the . collaborative role of 
teachers in the t ^acher evaluation process and in educational decision- 
making generally. Teachers have an equal vote on instructional com- 
mittees dealing with salaries, in-service training, administrator hiring, 
class size, and teacher assignment. Teachers have primary responsibil- 
ity for curriculum development and for assisting both new and experi- 
enced teachers in classioom improvement efforts. The SLTA president 
is invited to attend the superintendent's staff meetings. Thus, teachers 
play a key role not only in the evaluation process itself, but also in all 
of the functions that support the implementation of evaluation. 

Strategic Compatibility 

Salt Lake City achieves strategic compatibility through shared 
decisionmaking rather than central enforcement. Like all other func- 
tions in the district, teacher evaluation relies on consensual decision- 
making, supported by the various mechanisms through which both 
teachers and parents can influence school operations. Decentralization 
in the context of shared governance permits a form of evaluation that 
is personalized rather than standardized, since the system opens per- 
formance to public scrutiny and comment, while its decisionmaking 
processes guard against unfairness. 

In an effort to ensure that decentralized, democratic decisionmaking 
will result in the right outcomes for the system as a whole, the Salt 
Lake board of education recently sought to focus attention on system- 
wide goals by offering a salary increment to administrators for the 
annual attainment of these goals- It also made optional the earlier 
requirement that teachers set personal teaching goals. Many teachers 
had complained that the annual goal-setting process had lost'its signifi- 
cance. Some called shared governance "shoved governance," implying 
that they did not have an equal share in decisionmaking. In sum, Salt 
Lake is still trying to balance democratic governance and centralized 
management 
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LAKE WASHINGTON: AN ENGINEERING APPROACH 
TO INSTRUCTIONAL IMPROVEMENT 

Lake Washington, a well-to-do suburban district of 18,000 students, 
is growing in enrollment. At the hub of the Washington aerospace 
industry, the district's professional clientele understand an engineering 
approach to problem solving, and they support the superintendent's 
integrated systems model for educational reform. 

Despite statewide fiscal retrenchment, per pupil expenditures in 
Lake Washington remain relatively high, in part because the district 
has received public support in passing bond levies for the schools. A 
large portion of the district's budget is used to support a variety of 
staff development activities centered on Madeline Hunter's instruc- 
tional theory into practice (ITIP) approach. Skilled teachers designated 
as ITIP trainers help to maintain a uniform instructional approach in 
the district's staff development and teacher evaluation efforts. 

In contrast to that of Salt Lake City, Lake Washington's teacher 
evaluation process is highly structured from beginning to end. 
Developed in 1976 in response to a state mandate, the evaluation sys- 
tem employs the state criteria in a checklist that the principal uses in 
observations of each teacher twice each year. Pre- and postobservation 
conferences accompany each classroom visit. 

If a teacher receives less than a satisfactory rating on any criterion, 
the principal outlines a detailed personal development plan, which may 
include assistance from an experienced teacher, in-service classes, and 
specific reading assignments. If the teacher fails to improve, the prin- 
cipal places him or her on probation. During the probationary period, 
the principal meets weekly with the teacher to monitor progress toward 
specified performance levels. At the end of the semester, the principal, 
together with central office supervisors, decides the continued tenure of 
the teacher in the school district. 

Although the professed goal of teacher evaluation ^in Lake Washing- 
ton is instructional improvement rather than accountability, the sys- 
tem is designed to be used for making personnel decisions. District 
administrators claim that the evaluation system has resulted in the 
counseling out of about 40 teachers over a four-year period, a figure 
representing about 5 percent of the total teaching force in the district. 

A concomitant emphasis on staff development and rationalized 
management are said to have brought a 20-percentile gain in pupil 
achievement scores oyer the same period. The cornerstone of Lake , 
Washington's approach is the principal's role in managing the attain- 
ment of centrally determined goals and performance standards. 
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Organizational Commitment 

The superintendent began his term by stating that people are the 
most important asset that any school district has and that the most 
important people are those who work with children in schools. One of 
his first acts as superintendent was to eliminate 33 positions in the 
central office and to allocate the $700,000 in savings to staff develop- 
ment. That allocation rose to $1 million in 1983, about 2 percent of 
the district's total budget. 

Staff development is tightly linked to teacher evaluation in Lake 
Washington. In addition to a 30-hour ITIP training course, teachers 
are expected to tarn nine credits from in-service course work each year, 
and each school receives an annual allocation of $1500 for staff 
development. 

Principals are evaluated on how well they manage staff development 
(including how many of their teachers have taken the ITIP course) and 
on how well they evaluate teachers. When a principal identifies a 
teacher who needs assistance in the classroom, he or she can call on 
one of five full-time ITIP trainers or the ITIP satellite teacher in the 
school, who receives released time to provide this assistance. 

These resources are all brought to bear in the evaluation process. If 
a teacher is performing poorly, the mandated personal development 
plan will include specific staff development courses and ITIP trainer 
assistance in the classroom, as well as increased supervision by the 
principal. 

The superintendent's emphasis on evaluation and his willingness to 
support principals' difficult decisions have made the process meaning- 
ful. Both central office administrators and school principals spend 
about 20 percent of their time on evaluation, and the same formal pro- 
cess that, once resulted in no personnel decisions now leads to concrete 
action for improvement or termination. 

Evaluation Competence 

As in Salt Lake City, principals evaluate teachers and initiate proba- 
tion procedures for those who are performing poorly. Once probation 
begins, ITIP trainers provide most of the help to teachers needing 
improvement. These trainers are drawn from the ranks of Lake Wash- 
ington teachers and are trained in instructional development. 

Although evaluation and assistance function separately, the princi- 
pals have also received extensive ITIP training. Thanks to this train- 
ing, evaluators and trainers share a common understanding of good 
teaching, and the teacher in difficulty receives help that is consistent 
with the criteria on which he or she is evaluated. 
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Collaboration 

The Lake Washington teachers* association participated in the 
development of the district's response to the state teacher evaluation 
mandate. Although the district does not give teachers an equal voice 
in its operations, teachers participate in the evaluation process through 
the involvement of ITIP trainers for teacher assistance and of an asso- 
ciation representative when a teacher is placed on probation. The 
superintendent meets with the head of the teachers' organization at 
least once every two weeks to discuss mutual concerns and problems, 
including but not limited to the functioning of the teacher evaluation 
process. 

Strategic Compatibility 

Of ttiK. four case study districts, Lake Washington'?, engineering 
approach to school district management produces the most obvious 
compatibility, even consistency, among school improvement strategies. 
The school board's priorities translate into annual goals and perfor- 
mance standards, for every staff position and for each school. Staff 
development, program evaluation, and teacher evaluation are closely 
linked by reference to these goals and by their common emphasis on 
ITIP principles and evaluation strategies. A highly rationalized pro- 
cess of need assessment, planning, and monitoring by which principals 
evaluate and are evaluated provides the tactical glue for these efforts. 

The procedural and substantive uniformity that have contributed to 
the effectiveness of Lake Washington's teacher evaluation process now 
challenges its continued usefulness. As instruction has improved, the 
system has begun to recognize the need for differentiated evaluation 
responsive to individual teachers* skills and requirements': Adapting 
the system to provide incentives to already competent teachers will 
require striking a balance between the uniformity that permits identifi- 
cation of poor teaching and the flexibility that will inspire further 
development of good teaching. 

GREENWICH: THE PERFORMANCE GOAL APPROACH 
IN A MANAGEMENT TOWN 

Greenwich, Connecticut, a wealthy suburban district of 7500 stu- 
dents, is populated largely by managers and professionals, The 
district's performance goal approach to school management and teacher 
evaluation reflects a managerial orientation based on incentives. 
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Operationally, the Greenwich approach means that, v-hile centrally 
determined goals are used for school management decisions, the goals 
by which teachers are evaluated are not necessarily predetermined sys- 
tem goals. Each year, in consultation with the principal or teacher 
leader (a teacher with part-time administrative status), teachers set 
their own individual goals, plans for achieving the goals, and means for 
measuring whether the goals have been accomplished. Although teach- 
ers may choose system goals, the evaluation process is intended to 
foster individual improvement, and its design allows for individualized 
definitions of growth and development. 

The Greenwich evaluation process includes at least one observation 
and three conferences between the evaluator and teacher each year. 
Teachers complete a self-evaluation report, and evaluators complete an 
open-ended evaluation report, which may be based on both the specific 
annual goals and on general teaching guidelines included in the collec- 
tive bargaining agreement. Evaluation may result in a teacher's being 
placed on marginal status, but this rarely occurs in Greenwich— 
perhaps because of the evaluation process, or perhaps because the 
district's teaching force is highly experienced and highly educated. 

The test of the Greenwich approach, given its individualized nature, . 
is whether teachers say that it helps them improve their teaching. In 
recent surveys conducted by the district, about half of them said that it 
did. Because it operates carefully, the process forces regularized, 
teacher-specific interaction between principals and teachers and pro- 
vides a focus and recognition for teachers' efforts. Based on a motiva- 
tional theory of management, the approach tries to balance individual 
stages of development and system goals. Whether the process will 
adapt to the personnel decisions that may soon be required in this de- 
clining enrollment district remains to' be seen. 

Organizational Commitment 

Teacher evaluation in Greenwich is emphasized in several ways. 
First, in recognition of the fact that evaluation takes time if it is to be 
done well, Greenwich has set a target ratio of 1 evaluator to 20 
evaluatees and has deployed teacher leaders (who spend about half 
time teaching and half time on evaluation) to maintain this ratio in 
schools across the district. The released time and stipends of the 
teacher leaders translate /ihto increased material resources for evalua- 
tion. / 

Second, both principals and teacher leaders are evaluated on how 
well they perform thei^ evaluation fuptftions. The assistant superinten- 
dents for elementary and secondary education read and critique each 
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teacher evaluation isport for its thorcughness and specificity. They 
also check to how well the evaluations match up against the lists of 
marginal arj uutstanding teachers that the principals include in their 
annual •^cbool assessment report, 

Ii >r >v: evaluation performance will likely appear as a personal 
gor' x'pi 't'- incipal's annual review if it has received insufficient atten- 
ti J ?ince teacher evaluation is the major administrative responsibil- 
ity of teacher leaders, their continuation in that position is tied to their 
performance as evaluators. 

TfSvaluator Competence 

In Greenwich, I>oth principals and teacher leaders evaluate and offer 
recommendations for improvement. Their efforts are ^^pported by 66 
aenior teachers, who receive released time and small stipe^ids to assist 
and counsel other teachers on matters of curriculum and teaching tech- 
nique. Differentiated staff — experts in different grades and subjects- 
provide specific help. Principals and other evaluators receive training 
and feedback on evaluation techniques* in periodic workshops. 

Collaboration 

The Greenwich Education Association (GEA) played a central role 
in developing not only the district's evaluation system but also the 
state's teacher evaluation requirements. The Greenwich system of 
mutually developed goals for teacher evaluation gives teachers a collab- 
orative role in the evaluation process itself. This approach, instituted 
in Greenwich in 1971, was adopted in 1974 as part of Connecticut's 
teacher evaluation requirements at the urging of the GEA president. 

The GEA helped develop the criteria for teacher evaluation found in 
the collective bargaining agreement. A district-level committee com- 
posed of six administrators and six GEA appointees oversees the imple- 
mentation. This committee conducts periodic surveys of teachers* 
views of the process and makes recommendations for its continued 
improvement. 

Strategic Compatibility 

Greenwich's interest in using teacher evaluation for multiple pur- 
poses is forcing it to confront the evaluation dilemma — the tension 
between the flexibility needed for teacher improvement and the stan- 
dardization needed for system control and personnel decisions. 
Teacher evaluation and staff development have rested on a model of 
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self-improvement based on teachers' personal goals. These goals are 
articulated in the evaluation process and pursued through both clinical 
supervision and individually selected staff development courses. In a 
sense, each teacher is evaluated against his or her own yardstick, 
appropriate to his or her stage of development and particular teaching 
challenges. 

In recent years, the district's management-by-objectives strategy has 
begun to collide with the personal goal-setting strategy as centrally 
determined goals are accorded precedence. The district's plan to use 
teacher evaluation results as a factor in reduction-in-force decisions 
adds to tensions of individualized goal setting and assessment. These 
strategic inconsistencies may detract from the effectiveness of the 
teacher evaluation system. 

TOLEDO: INTERN AND INTERVENTION PROGRAMS 
IN A UNION TOWN 

Toledo is a working-class, union town with a strong teachers* union. 
In the 1970s, a long-standing conflict between the school district 
management and the teachers* union, fiscal distress, and a lengthy 
teachers* strike led to a series of district school shutdowns. Only the 
concerted efforts of administrators and teachers to repair the rift by 
agreeing to share decisionmaking powers reversed the decline in stu- 
dent enrollment and public support for the schools. 

As elsewhere, teacher evaluation in Toledo responds to public 
demands for evidence of quality control in the school system. The 
difference is that in Toledo the teachers* organization took the lead in 
defining and enforcing a standard of professional conduct and com- 
petence. 

Toledo*s teacher evaluation system differs from others in two imY:or- 
tant respects. First, skilled consulting teachers evaluate new teachers 
and experienced teachers having difficulty. Second, the evaluation pro- 
cess does not seek to evaluate each teacher each year. Evaluation 
resources are targeted on first-year teachers (interns) and teachers 
assigned to an intervention program. The consulting teachers observe 
and confer with these teachers at leaic once every two weeks for the 
period of the internship or intervention. 

Principals evaluate other teachers annually until the teachers receive 
tenure, and once every four years thereafter. If a teacher qualifies for a 
continuing contract, formal evaluation ceases unleas the te^h€P is 
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placed in the intervention program. The principal and the union's 
building committee jointly decide the assignment of a teacher to 
intervention; the assistant superintendent of personnel and the 
president of the Toledo Federation of Teachers (TFT) must concur in 
the decision. 

Although the express purpose of evaluation in Toledo is to promote 
individual professional growth, evaluation serv'es as the basis for mak- 
ing personnel decisions regarding contract status and continued tenure 
in the district. In the two years since the intern and intervention pro- 
grams began, 4 of 66 interns were not rehired and 4 of 10 intervention 
teachers were removed from classroom teaching. The intensive supervi- 
sion and assistance provided to intern and intervention teachers serves 
the individual improvement purpose for these teachers, but not to the 
exclusion of accountability goals. 

Organizational Commitment 

Top-level commitment to the evaluation process in Toledo is institu- 
tionalized in the form of an Intern Review Board, chaired in alternate 
years by the assistant superintendent of personnel and the TFT 
president, "^his board, which reports to the superintendent of schools, 
ensures the smooth functioning and continued improvement of the 
intern and intervention programs; it also se'rves as a forum in which 
deficiencies in the regular teacher evaluation process come to light at 
the top of the system. The composition and visibility of the Intern 
Review Board serve to direct attention to the evaluation function as it 
operates throughout the district, 

Toledo lias created time for evaluation by using consulting teachers 
as the primary evaluators of interns and intervention teachers. 
Depending on the number of teachers they are supervising at a given 
time, the consultants are released from classroom teaching responsibili- 
ties full- or part-time for up to three years. A full-time consultant may 
supervise no more than ten interns or intervention teachers at a time. 

An annual allocation of $80,000 supports the costs of substitute 
teachers for consultants on released time, stipends and in-service train- 
ing for the consultants, and curriculum and other materials used in 
assisting the interns and intervention teachers. These resources are 
devoted to the teachers needing the most assistance. Over the past two 
years, Toledo has spent an average of $2000 per intern or intervention 
teacher to provide this level of clinical supervision. 




38 



Evaluator Competence 

In Toledo, expert consulting teachers both evaluate and assist first- 
year teachers, but the principal files a summary evaluation report on 
intern's nonteaching performance. The principal assumes the 
evaluation role after the teacher's first year. Consulting teachers are 
then used to provide classroom assistance to other teachers on a volun-^ 
tary basis at the teacher's request (or principal's encouragement) or on 
ai} intensive and mandatory basis when a teacher is assigned to the 
intervention program. 

The Intern Review Board selects consulting teachers after carefully 
screening candidates' qualifications, including teaching, leadership, and 
human relations skills. An in-ser\'ice program prepares consulting 
teachers for their roles, and the Intern Review Board provides a 
mechanism for assessing the quality of consulting teachers' efforts. 

Collaboration s 

' The Toledo Federation of Teachers was the primary initiator of the 
intern and intervention programs. The TFT president had tried to 
negotiate a peer review system for first-year teachers for nearly a 
decade before it was accepted as part of the 1981 collective bargaining 
agreement. The administration extended the concept to include an 
intervention program for teachers experiencing difficulty in the class- 
room. The principal and the TFT building committee together assign 
teachers to intervention. 

The Intern Revie\y Board, composed of five TFT appointees and 
four administrators, administers the intern and intervention programs. 
The board meets throughout the school year to guide the evaluation 
process and to oversee the efforts of the consulting teachers. 

In a further collaborative effort, teachers in Toledo schools 'elect 
their department chairpersons and building representatives to staff 
development committees. At the district level, TFT-appointed 
representatives serve on all committees relating to curriculum, testmg, 
and staff development. The superintendent and his staff meet at least 
monthly with TFT leaders to discuss educational policy development 
and implementatio^^"^ Toledo teachers have a strong voice in virtually 
every area of instrjlictional policy. 



Strategic Compfttibility 

In Toledo, thanks to a central balance of powers, committees com- 
posed of administrators, teacher representatives, and (in some cases) 
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parents develop new strategies and policies for school improvement and 
oversee their implementation. Teachers* professional empowerment is 
expressed through union representation and bargaining power in top- 
level decisionmaking. 

The prominent role of Toledo teachers in the intern and interven- 
tion programs contrasts with the traditional role of teachers in the 
evaluation process that has as its goal teacher protection rather than 
teacher participation. However, increased teacher power in the shaping 
of other teaching policies, including staff development, may ultimately 
increase their responsibilities as partners in educational improvement 
as well. 

The Toledo school district, in operating on the basis of negotiated 
responsibility, is moving toward collaborative control over instructional 
quality. But because teacher responsibility limits management's 
decisionmaking prerogatives while also potentially undermining teacher 
protection, it can threaten both management's and union's traditional 
power bases. Thus, if this approach is to succeed, management and 
union will have to maintain a balance of their ppwers in all Areas of 
educational policymaking. Otherwise, power struggles will fragment 
the educational process and defeat the public interest. 



SIMILARITIES OF IMPLEMENTATION 
THAT MAKE THESE SYSTEMS WORK 

Each case study district has demonstrated organizational commit- 
ment to teacher evaluation, procedures for ensuring evaluator com- 
petence, collaboration with the teachers* organization and individual 
teachers, and compatibility of teacher evaluation with other district 
management strategies. These four factors underlie the success of 
these evaluation systerjti^. 

Organizational Commitment 

Personnel evaluation discomforts any organization. It contains the 
potential for misunderstanding, miscommunication, and anxiety on the 
part of both evaluators and those whom they evaluate. Good evalua- 
tion, however, offers the opportunity to improve organizational morale 
^ and effectiveness. It can foster concrete understanding of organiza- 
tional goals and regularize communication among school personnel 
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about the actual teaching work of the organization. It can also deliver 
the message that the organization needs these people and their efforts 
to accomplish its goals. 

To make evaluation more than an isolated, peripheral activity, an 
organization must insist on the importance of evaluation from the top 
levels of the organization, institute concrete mechanisms for translat- 
ing that insistence into action, and provide sufficient resources to the 
evaluation process. Evaluation cannot be considered an add-on func- 
tion if it is to succeed. It must be a central mission for the organiza- 
tion, and it must be supported by resources that enable its results to be 
used. 

Each case itudy district developed its own strategy for focusing orga- 
nizational attention on the evaluation process. Although their 
approaches differ in specifics, they all recognize that a key obstacle to 
successful evaluation is time— or, more precisely, the lack of it— for 
observing, conferring with, and, especially, assisting teachers who most 
need intensive help. Time for these functions must compete with other 
pressing needs unless human resources for the functions are expanded 
and incentives for using those resources are continuous and explicit, 

E valuator Competence 

Valid, reliable, and helpful evaluation requires evaluators who recog- 
nize good teaching (and its absence} and who know how to improve 
poor teaching When they find it, Evaluator competence is probably the 
most difficult element of the process. The best supported and most 
carefully constructed process will founder if those responsible for 
implementation lack the necessary background, knowledge, and exper- 
tise, 

-Evaluator competence requires two qualities: the ability to make 
sound judgments about teaching quality and the ability to make 
appropriate, concrete recommendations for improvement of teaching 
performance. If evaluation processes were designed solely to get rid of 
poor teachers, the second quality would not be needed. However, most 
evaluation processes also intend to improve instruction, and even those 
that strive for accountability must, in the interest of fairness, include a 
real opportunity for improvement before a teacher is, dismissed. Thus, 
those who evaluate must both judge proficiently and help effectively. 

The four case study districts all recognize this dual function of 
evaluation, arid all, to varying degrees, divide the function between 
principals and expert teachers. In Lake Washington and Salt Lake 
City, principals evaluate teachers and initiate probation or remediation 
procedures for those who are performing poorly. Once probation or 
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remediation begins, however, expert teachers^ITIP trainers in Lake 
Washington and teacher specialists in Salt Lake— provide most of the 
help to teachers needing improvement. Salt Lake also operates a peer 
adviser program for first-year teachers in which skilled, experienced 
teachers receive small stipends and released time to help and counsel 
new teachers. 

In Toledo, expert consulting teachers both evaluate and assist first- 
year teachers, but the principal files a summary evaluation report on 
the intern's nonteaching performance. The principal assumes the 
evaluation role after the first year. Consulting teachers are then used 
to provide classroom assistance to other teachers on a voluntary basis 
at the teacher's request (or principal's encouragement) or on an inten- 
sive and mandatory basis when a teacher is assigned to the interven- 
tion program. 

In Greenwich, both principals and teacher leaders evaluate and offer 
recommendations for improvement. Their efforts are supported by 66 
senior teachers, who receive released time and small stipends to assist 
and counsel other teachers on matters of curriculum and teaching tech- 
nique. 

Several considerations underlie the division of evaluation and assis- 
tance between administrators and teachers who have been selected for 
their teaching and counseling abilities. The first consideration is time. 
Even a conscientious and competent principal who gives evaluation 
high priority has other administrative duties that compete for his or 
her time. He or she certainly lacks the time to help a teacher who 
requires intensive day-to-day supervision. Someone for whom it is a 
primary responsibility must provide the help for such improvement. 

The second consideration in dividing these responsibilities — one 
often cited in the literature on teacher evaluation — involves the possi- 
bility that role conflict precludes one person's serving as both judge 
and helper. According to the theory, the judgmental relationships of 
evaluation inhibit the trust and rapport that a helper needs to motivate 
a teacher to improve his or her performance. This theory received lim- 
ited empirical support in our studies. 

To the extent that role conflict exists, however, it does not seem to 
.operate in a simple, straightforward manner but depends, rather, on 
the evaluator's temperament, the incentive structure in the school dis- 
trict, and the prevailing ethos of the school. Nonetheless, some separa- 
tion of evaluation from assistance (if only by the involvement of a 
* committee rather than a single decisionmaker) seems to have proved a 
productive strategy in these districts. 

The final consideration goes to the heart of the evaluator com- 
petence issue. Principals are not always chosen for either , their 
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evaluation ability or their outstanding teaching ability. In fact, an 
elementary school principal may never have taught in an elementary 
school, and a secondary school principal is not likely to have 
knowledge of all areas of the high school curriculum. 

WhilK principals may know or be trained to recognize the presence 
or absence of generic teaching competence, the task of providing con- 
crete assistance to a teacher in trouble often requires more intimate 
knowledge of a particular teaching area than a principal is likely to 
possess. The logical solution to this dilemma is to assign the assis- 
tance function to one who has already demonstrated competence in an 
area of teaching expertise. 

The use by the case study districts of a differentiated staffing model 
for teacher evaluation and assistance allows them to deploy district 
resources and expertise efficiently. In all cases, committees composed 
of both teachers and administrators choose the variously titled teacher 
experts on the basis of teaching competence and interpersonal skills, 
T<|ie"expert teachers are assigned to provide as close a match as possi- 
ble to the teaching area of the teacher whom they are to supervise 
and/or assist. 

In addition, all case study districts provide some form of in-service 
training for evaluators on evaluation goals, procedures, and techniques. 
This training varies in emphasis and frequency. Lake Washington pro- 
vides the most intensive evaluator training oi the four districts. Prin- 
cipals attend a two-week workshop every summer which includes study 
of ITIP techniques, clinical supervision skills, and evaluation methods. 
During the school year, they attend monthly seminars that reinforce 
and expand on many of the same topics. 

Ultimately, though, supervision of the evaluation process in each of 
the four districts provides the most important check on evaluator com- 
petence. All four districts have mechanisms for verifying the accuracy 
of evaluators' reports about teachers. These mechanisms force evalua- 
tors to justify their ratings in precise, concrete terms. Outside the for- 
mal evaluation process, mechanisms for controlling instructional 
quality— Salt Lake's review-of-services process and the school perfor- 
mance assessments in Lake Washington and Greenwich— increase the 
probability that poor teaching performance will be identified even when 
evaluation reports fail. 

Collaboration in Development and Implementation 

In the four case study districts, the teachers* organization has collab- 
orated with the administration in the design and implementation of the 
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teacher evaluation process. The extent and nature of the collaboration 
between teachers and administrators in the four districts varies accord- 
ing to their political contexts and organizational characteristics. They 
have in common, however, means for maintaining communication 
about evaluation goals, processes, and outcomes so that im- 
plementation problems can be addressed as they occur. Consequently, 
evaluation is not an adversarial process, but one in which teachers and 
administrators work together to improve the quality of evaluation. 

Strategic Compatibility 

Most school districts function with a raixture of policies and pro- 
cedures, some of which work together end some of which do not. 
These case studies support the idea that a process as fragile as teacher 
evaluation must be compatible with at leasi; those other district policies 
that define the nature of teaching. 

In each case study district, teacher evaluation supports and is sup- 
ported by other key operating functions in the schools. Evaluation is 
not just an ancillary activity; it is part cf i larger strategy for school 
improvement. The form and function of evaluation make it compatible 
with other tactics adopted to accomplish other district goals. 

The success of teacher evaluation depends flmrlly on the delimita- 
tion of its role in the school system. No single ev Uuation process can 
simultaneously serve all of the possible goals of evaluation well. Nor 
can evaluation serve alone as the tactical glue for diverse approaches to 
school improvem^jnt. In a practical. sense, appropriate strata, <ies for 
teacher evaluation oxplicitly address a high -priority goal of the school 
organization withouc coUading with other furictioris or goals. This 
mecns that the purposes of leacher evpluation in the organizational 
contei'f, must be carefully defined. It also means that new priorities 
may require explicit changes in teacher evaluation. 
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IV. EVALUATING THE TEACHER EVALUATION 

SYSTEMS 



The four case study teacher evaluation systems succeed in several 
ways. First, and relatively atypically, the school systems implement 
them as planned. Second, all actors in the system understand them. 
Third, the school systems actually use the results. 

In varying degrees, the evaluation processes produce reliable, valid 
measures of teaching performance and are used for teacher improve- 
ment and personnel decisions. We examine belov/ how the four sys- 
tems attempt to ensure reliability, validity, and utility. In the course of 
this examination, we discuss the capabilities and limits of each 
approach. 



RELIABILITY 

Reliability in evaluation refers to the consistency of measurements 
across evaluators and observations. To ensure reliability, some evalua- 
tion systems uae a detailed observation instrument that specifies 
behaviors to be observed and guidelines for rating those behaviors. 
Other systems train evaluators to use the same criteria the same way 
for each evaluation. Still others develop a common standard and have 
^valuators discuss and critique each other's evaluations. 

The degree of reliability required of a teacher evaluation system 
depends on the use to be made of the results. Personnel decisions 
demand the highest reliability of evaluation results. Evaluation criteria 
must be standardized and evaluators mu3t apply these criteria with 
consistency when the results are to be used for personnel decisions 
regarding tenure, dismissal, pay, and promotion. The evaluation sys- 
tem may tolerate a lower degree of reliability when the results are to be 
used, for example, for formative assessments or informational purposes. 
Even for these purposes, however, reliability cannot be disregarded, for 
it affects both teacher morale and the perceived legitimacy of the pro- 
cess. Variability may replace reliability if the goal is to encourage indi- 
vidual development based on personally defined needs. 

The case study districts use different methods and devote various 
levels of attention to reliability concerns. Of the four, Toledo's intern 
and intervention programs take the most comprehensive approach to 
ensuring reliability. Lake Washington, Greenwich, and Salt Lake City 
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have a more difficult taflk becauao thoy use principals as primary 
evaluators and evaluate all teachers, thus increasing the number of 
raters, ratees, and observations to be standardized. 

At least three lources of variability may make teacher evaluation 
unreliable: (1) varit.bility in how evaluators interpret what they observe 
or what criteria they stress in making judgments; (2) variability in the 
evaluations of a single evaluator, i.e., whether the evaluator uses the 
same criteria and applies them consistently when observing different 
teachers; and (3) variability in observations, i.e., whether the evaluator 
uses the same criteria and applies them in the same manner when 
observing the same teacher on separate occasions. 



Toledo's evaluation process addresses all of these potential sources 
of unreliability by using a small number of evaluators, a reporting pro- 
cesG that fosters common assessment criteria and applications, and fre- 
quent observation a::d consultation. The small number of consulting 
teachers who evaluate reduces the range of variability among evalua- 
tors. 

More important, the consulting teachers discuss their observations 
and evaluations with the Intern Review Panel several times a year. 
Even consultants who have no current assignments attend the meet- 
ings. These discussions make the rating criteria explicit and conci^- te. 
In the discussions, the consulting teachers develop ei common frame- 
work for rating teaching characteristics "outstanding," "satisfactory," 
or "unsatisfactory." The effect is to reduce variability across evalua- 
tions and across observations. 

The use of a small group of evaluators in many schools increases 
system-wide reliability. Although evaluators may consider school con- 
text in judging the appropriateness of teaching methods used with a 
particular group of students, they are unlikely to accept a lower stan- 
dard of teaching in one school than another. In a less centralized sys- 
tem using more evaluators, the evaluator*s frame of reference may be 
only a single school and evaluations will vary more. 

Finally, frequent classroom observation enhances the reliability of 
the process. (They also heighten its validity.) Evaluation based on 
observations made at least twice a month over the course of an entire 
school year eliminates the common complaint that a single observation 
cannot adequately measure teaching ability. The equally intensive con- 
sultation process, which incorporates joint goal setting arid problem 
solving, also increases the probability that evaluator and evaluatee will 
arrive at a common understanding of what is being observed and 
evaluated. 
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Tlie Toledo intern and intervention pronnnns also increuHe reliabil- 
ity by limitinf; the ruinber of teachers to be evaluated and by allowini; 
the ainall group of expert teachers who evaluate them released time. 
Thus, the evaluator is able to work intensively with the teacher being 
evaluated. 

Lake Washington 

The Lake Wafjhington, Greenwich, and Salt Lake City teacher 
evaluation processes require an administrator to evaluate every teacher 
every year. Th'm requirement decreases evaluation reliability by 
increasing the chances of variability among evaluators and variability 
across evaluations and observations. Evaluator training helps to offset 
these sources of imreUability to varying degrees in the three districts. 

In Lake Washington, principals receive ongoing training in ITIP 
principles, clinical supervision, and observation techniques. This train- 
ing enables evaluators to interpret what they observe in similar ways. 
Most ceachers and administrators in Lake Washington feel that, with a 
few exceptions, principals make fair and consistent assessments and 
that the standards do not vary widely from school to school. 

The "evaluative criteria checklist" in Lake Washington is also 
intended to promote reliability by specifying 29 behaviors to be 
observed under the seven evaluation criteria. Since the evaluators 
must rate each behavior listed on the checklist, the instrument helps to 
focus their attention on these aspects of teaching. Although the 
behaviors are not precisely defined (they include, for example, 
"develops plans," "teaches the curriculum,'* and "prepares niaterials"), 
the list may prevent evaluators from ignoring certain teaching activi- 
ties or from applying the criteria unevenly to different teachers. 

However, the requirement that all teachers be evaluated each year 
Ureases the time that an evaluator can devote to any one teacher. As 
a result, teachers who are having obvious difficulty receive more atten- 
tion than those who are not. As one principal put it: 

I have to evaluate too many people. Four or five people are taking 
all of my attention and I am just doing lip service for the rest. There 
is no way to fit all of this in within the present system and state con- 
straints. So I just go through the motions with half of them. 

Perfunctory evaluation reduces the reliability of judgments. Pririci- 
pals acknowledge that some teachers escape being placed on probation 
because evaluators cannot afford the time it takes to administer the 
probationary process. Overextension of the evaluator's time results in 
unreliability across evaluations. 
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Greonvvich 

Greenwich became concerned about reliability in evaluation when it 
faced the prospect of using evaluation results for making personnel 
decisions related to reductions-in-force. The evaluation process v/as 
originally intended to allow for varying standards of progress depend- 
ing on teachers' felt needs and stages of development. That conception 
required reliability across observations of a single teacher but not 
especially across evaluations of different teachers or oven among 
evaluators as a group. 

The flexible character of the Greenwich evaluation process shows in 
the instrument used and the way in which criteria are applied. The 
evaluation form includes three spaces, labeled '^description of observa- 
tion," "summary comments," and "teacher comments." It lacks a 
checklist and specific ratings to be applied. 

Evaluators may, however, draw on a list of guidelines for profes- 
sional performance as deemed appropriate. The guidelines include 
aspects of performance ranging from "shows evidence of planning and 
good organization" to "interprets educational programs, procedures, 
and plans to the public," "has mature understanding of own and oth- 
ers' problems," and "conducts self in an ethical manner." These are 
high-inference variables, some of which are not easily observable. The 
guidelines are used selectively in conjunction with mutually developed 
individual teacher goals as criteria for evaluation. 

This process provides evaluation of low reliability because the cri- 
teria and the manner in which they are applied vary (intentionally) 
from teacher to teacher. Low reliability does not invalidate the process 
for its intended purpose of individual staff improvement, but it limits 
the applicability of the process to other purposes that would require 
more highly standardized comparisons of teachers. 

The most important feature of the process for individual staff 
improvement is reliability across observations of a single teacher. If 
the evaluator is to gauge progress, he or she must apply the selected, 
individually pertinent criteria consistently across observations. 

Several features of the Greenwich process encourage this type of 
reliability: the teacher's development of an achievement plan and 
assessment criteria, the requirement that the goals be measurable or 
observable, and the year-end assessment by both the evaluator and 
teacher as to whether the goals have been "fully" or "partly accom- 
plished" or "missed." However, the evaluato/s summary observation is 
expected to include observations about other aspects of the teacher's 
performance. 

As teacher evaluation has acquired new purposes in Greenwich, 
administrators have made efforts to enhance the other forms of reli- 
ability. Central oiiice supervisors road and critique all evaluations for 
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their clarity and procinion of dcMcriplion. 'IVaiiiiiiK m^HHiMim for cvaliia- 
tors fociiH on critical diHcuMHion of tlu»M(? ovalindionH to improve olmcr- 
vation ond reporting practices so that the reports provide more con- 
crete and potentially f^enernlizahle information. 

At tho end of each school year, principals must include lists of "out- 
standing" and "marginal" teachers in their school assessment docu- 
ments. Efforts are made in tho training sessions to assess whether the 
evaluation reports for teachers identified as marginal or outstanding 
contain adequate data for these judgments.^ 

Salt Lake City 

Salt Lake City's regular, preremediation evaluation process resem- 
bles that used in Greenw^ich, including its sources of unreliability. The 
Salt Lake evaluation process lacks observation instruments and check- 
lists to guide the annual principal-teacher conference and classroom 
observation. At the conference, the principal and teacher consider 
which of a list of system-wide, school-wide, or personal goals the 
teacher will focus on for the year. 

The decentralized management, struct;ure in Salt Lake, which pro- 
duces different school goals and emphases, also encourages diverse 
standards for teacher evaluation. As in Greenwich, the evaluation pro- 
cess does not apply standardized criteria uniformly across teachers. 
Unlike in Greenwich, principals do not receive ongoing evaluation 
training to enhance reliability across evaluators. 

The decisionmaking process for placing a teacher on remediation is 
not standardized. However, once a teacher is placed on remediation, 
some standardizing elements are introduced into the process. The cen- 
tral office selects one representative for each remediation team from a 
small pool of five learning specialists. The learning specialists bring a 
measure of consistency to the process because each serves on multiple 
remediation teams (thereby increasing reliability across evaluations) 
and, presumably, they share a common viev^npoint about the goals and 
conduct of the remediation process. 

However, the two SLTA-appointed representativos on the remedia- 
tion team are drawn from a large pool. They bring less consistency to 
the process because each member serves on fewer teams and they 
receive no training to offset different views of evaluation and good 
teaching. The principal is the fourth member of the team and the final 
decisionmaker as to the success or failure of remediation. The team's 

^Although the evaluation process may be used to place a teacher on marginal status, 
which would trigger a more intensive series of observations and counseling, this feature 
of the evaluation process is rarely used. 
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i!iv()lvc!n(>nt v\ iiih^idrd to iin*r(!aM(? ihv rolinbilily ol'lhitl (huMMioiuniik- 
\\\\\ |)r()('OMM l)y counliirini; \\u\ hiaMdii held l)y any Miiii^ld iiK^inlx^r. 

Th(' multiple viewH of the team moinbers may also help rationalize 
the applie/ilion of evaluation criteria to tc^aehers on remediation. A HhI 
of "'reaching Kxpectaneies" in used to guide the remediation effort, hut 
it is diineult to use for (liagnoHis heeauso it combines hehaviors (e.g., 
"/uljusts teehni(iU(?s to dilTerent k^arning styh^s"), outcomes (e.g., "evi- 
dence that student in working nt task"), attitudes (e.g., "all stvulents 
can learn"), and school conditions (e.g., "availnhility of resources per- 
sonnel") in a single list. Such criteria are dittlcult to apply reliably. 

The team approach for personnel decisions is necessary in Salt Lake 
to offset the other sources of unreliability in the evaluation process. It 
reduces arbitrary decisionmaking and obtains agreement about the 
appropri/Mt'neHS of an important personnel action. Thus, it bus politi- 
cal as Wx'll /»s methodological value. 

In sum, an effective evaluation system needs more than reliability. 
In fact, depending on the major goals of evaluation, it may not require 
reliability. A highly standardized, reliable process may not even suit 
some purposes. In the next section, we discuss validity and how the 
purposes of evaluation must guide judgments of its validity. 

-VALIDITY 

The validity of a teacher evaluation process depends on its accuracy 
and comprehensiveness in assessing teaching quality as defined by the 
agreed-on criteria. Although LEAs may seek to finesse the issue of 
validity by striving for measurement reliability in their evaluation pro- 
cess, they cannot ignore theNvalidity of the process when they use its 
results as a basis for personnel decisions. 

The criteria, the process for collecting data, and the competence of 
the evaluator contribute to the validity of an evaluation proces. The 
purpose of evaluation — the infeience to be drawn, the hfin to he given, 
the decision to be made— determines the validity of the evaluation pro- 
cess. In short, the process must suit the purpose if the results are to be 
judged valid. 

The criteria for judging minimal competence must be standardized, 
generalizable, and uniformly applied. Finer distinctions among good, 
better, and outstanding teachers require nonstandardized, i.e., differen- 
tial, criteria. 

Teaching research has demonstrated that effective teaching 
behaviors vary for different grade levels, subject areas, types of stu- 
dents, and instructional goals. Thus, relative teacher competence can- 
not be assessed on the basis of highly specified, uniform criteria. 
When a school district adopts a single set of broad criteria, it must 
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(lilTor(?ntin*(^ tlu»m' criteria lor Mpocilic n|)i)li('ali()iKi. b'xccjllonco above; 
all uuuit 1)0 nu^afuirod by broad, nonstand/^rdi/tMl critt^ria. 

ToacbinK competoncu may bo (;on(!oiv(5(l an a conliiunnn. Tbc; 
fnrtbor ono lanves alon^ tbo contiiuunn from minimal compotcnco to 
uxcollonco, tbo mora wido-raiiKi^^K inferoiitial tbo Houroos of data 
and tbo Iohh unifonrt and Kcnoralizablo tbo npooifio indioatorH. 

Tbe abHoluto nSinimum raiiiiroment for aocoptablo toaobin^ is tbo 
ability to run a iKnuliMnipfvo classroom. Our ntudios^ rovoalod tbat, 
more tban any otber fiif^)bloni, a disruptive classroom will trii^^or a spo- 
cial evaluation of a toncbor. A toaobor wbo oannot manage a olaKr.room 
ifl proMumod not to bo toaobing and to bo oreating a disturbanoo that 
disrupts the school. A disturbance that booomes visible outside the 
school disrupts organizational stability. Thus, teachers wbo lose con- 
trol of the classroom are the first to be identified for possible separa- 
tion. 

Beyond aoooptablo olassroom inanaKornont, minimal oompetonoo 
demands mastery of subjeot matter and a repertoire of teaching tech- 
niques. Ideally, a teacher will not be fully certified until he or she has 
mastered both! Many teacher evaluation processes focus on assessing 
minimal competence. 

Beyond minimal competence, a teacher must not only master subject 
matter and the repertoire of techniques but also must make appropri- 
ate judgments about when those techniques should be applied. This 
quality makes teaching a profession. A professional teacher has suffi- 
cient knowledge of subject matter and techniques to make appropriate 
decisions about instructional content and delivery for different stu- 
dents and classes. In other words, the professional teacher is able to 
ascertain students' needs and to meet them. / 

Beyond the ability to make appropriate teaching decisions lies the 
ability to diagnose unusually difficult learning problems, to deliver an 
unusually wide variety of instruction, and to inspire unusually creative 
or analytical thinking by students. This quality is excellence in teach- 
ing, whicb ■ excellence in all fields of human endeavor, is rare. 

The dei ju of evaluation differ along this continuum. The evalua- 
tor need special expertise to recognize that a classroom is out of 
control, l o evaluate minimum competence, the evaluator must be able 
to observe the presence or absence of generic teaching skills. However, 
to evaluate the appropriateness of teaching decisions, the evaluator 
must know the subject matter, the pedagogy, and the classroom charac- 
teristics of ;'ie teacher being evaluated. The evaluator's level of exper- 
tise must at least equal, if not exceed, that of the teacher being 
evaluated. 
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At' in othor |)r()IVMMi()nii, judKnionts of iho appropriatonomi of loach- 
iii^^ (IcciiiioHH inufit, ri^ly on pn^vailinij {tiandardfi of praf:tic(i. 'Vlui judi;- 
wwnl of oxcollonco in ItMichinfj, howtjvor, mmi. \n) hmml on nnporior 
HtandardH of practice. Tluni, the (wahiator nnist have a hi^h levol of 
oxpcrt ifio to judge oxcellonco. 

All fonr caHO atiidy diHtricU claim to hold a profomdonnl concoi)tion 
of teaching. Yet thoir evahnition proceHHeH conform to thin concei)tion 
in varyiuK aapoctH and to varying; dej^reea. What the proccHHCM aeek to 
nieaanre and what they actnully meaaure dependa on who ia heing 
ov/dnated, l)y whom, and for what piiri)08e. Those who are receiving 
intenaive aniujrviaion are ovuhiatcd differently, at loaat in degree, from 
thoae whoae performance ia merely being checked. Thoiic Bubject to an 
imminent personnel decision are evaluated differently from those who 
are not; evaluations made by generalist administrators differ from 
evalnationn made by teaching opecialiRts. 

Although (lifferontial evaluation omphn.sis is valid, uorioua problems 
arise when a i)rocosa that ia vaHd for one purpose is applied to other 
puri)oses or goals. A process that produces a valid measure of incom- 
petence may ill suit the measurement of degrees of competence. A pro- 
cess that reveals the extent of improvement in particular competences 
or areas of performance may not work for ranking teachers according 
to overall competence. Thus, in adopting a teacher evaluation system,u 
a school district must ensure that the system suits its evaluation goals. 

In discussing the validity of the teacher evaluation processes in our 
four case study districts, we distinguish between how the processes 
function for determining both minimal competence and degrees of com- 
petence. An LEA should base tenure and dismissal decisions on 
minimal competence. It should determine the degree of competence as 
a basis for helping teachers improve and making performance-related 
promotion and pay decisions. 

Evaluation of Minimal Competonco 

For the most part, evaluation by administrators in the case study 
districts stops short of judging professional competence as we have 
defined it above. In Salt Lake City, Lake Washington, and Toledo, the 
presence or absence of minimal teaching competence, especially the 
inability to manage the classroom, triggers remediation, probation, or 
ini;ervention. Most of the teachers placed in these programs cannot 
control a classroom. The lack of pedagogical knowledge or sophistica- 
tion may not, by itself, result in special treatment by the evaluation 
process. 




Indeed, in all of these systems, principals admit that they spend lit- 
tle time evaluating teachers who appear to be competent; teachers not 
subject to special treatment allege that their evaluations have net given 
them constructive criticism relevant to their area of teaching expertise. 
Competent teachers do not necessarily consider the process useless. 
Rather, they criticize evaluations for providing too few observations 
and evaluators for making comments that fail to relate specifically to 
the pedagogical demands of their particular teaching assignment. 

These criticisms do not indict the validity of evaluation systems; 
they indicate, however, that the systems are not especially designed to 
produce valid measures of the degree to which teachers have attained 
teaching competence in their particular areas of expertise. The 
strength of the processes lies in their ability to identify teacher incom- 
petence. With respect to this purpose, the processes enhance the \aiid- 
ity of evaluators' judgments in two ways. 

First, all four evaluation processes require careful documentation of 
teaching behaviors resulting in unsatisfactory ratings. This documen- 
tation enables someone other than the evaluator to verify that the 
teaching criteria have been applied appropriately. The use of multiple 
observers by Salt Lake's remediation teams helps to foster objectivity. 
In Lake Washington, evaluator training in ITIP principles (focusing on 

process of teaching that purportedly transcends subject matter differ- 
ences) provides a common framework for evaluation judgments. In 
Toledo, a committee decides on intervention and judges how the inter- 
vention process is progressing. 

Second, the districts require multiple observations for evaluations. 
If inferences about teaching competence are to support personnel deci- 
sions, they must be based on an adequate sample of teaching perfor- 
mance. Because the goal of evaluation is to certify minimal com- 
petence and whether the teacher under observation is progressing 
toward achieving minimal competence, evaluators must be able to 
assess the generalizability of obser\'ed behaviors. The processes for 
remediation, intervention, and probation in Salt Lake, Toledo, and 
Lake Washington provide explicitly for multiple observations and 
devote resources in the form of evaluator time toward that end. 

The criteria and instruments used in these three districts bear only 
indirectiy on the judgment of minimal competence. The criteria 
include instructional skill (generically defined to mean the ability to 
plan^ organize, deliver, and evaluate instruction, to help students 
develop good work habits, etc.); classroom control and discipline; sub- 
ject matter knov/ledge (e.g., keeping abreast of new ideas); and personal 
characteristics (e.g., dependability). The checklists ir^.lude behaviors 
(e.g., "teaches the curriculum"), competences - "ability to 
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motivate"), and outcomes (e.g., "student behavior demonstrates accep-' 
tance of learning experience"). 

One can question whether opecific items on the checklists are neces-- 
sary or sufficient conditions for judging teacher competence. For 
example, does the presence of lesson plans mean teacher competence? 
Does the failure to follow the curriculum guide mean teacher incom- 
petence? Does the degree of student disruptiveness or passivity reflect 
the competence of the teacher*^ ; i, ' 

In reality, evaluators judge incompetence as a whole, and the speci- 
fied criteria have less bearing on the validity of the judgment than does 
the competence of the evaluator. Applying the criteria in a way that 
results in a defensible inference requires expertise on the part of the 
evaluator. The checklists merely focus the evaluator's attention on 
specific behaviors to be observed. *The evaluator most likfely make's a 
judgment and then rationalizes it against the checklist^, riteria. 

Toledo and Lake Washington have' taken aggressive uteps to ensure 
validity.^ Toledo chooses consulting teachers because they are recog- 
nized by their peers and administrators as experts in their teaching 
areas. The consultants are matched by teaching area to the interns ' 
they evaluate. Furthermore, the Intern Review Board forces the con- 
sulting teachers to make clear the standards of practice implicit in 
their judgments by requiring documentation of teaching- events, sugges- 
tions made,, and concrete reasons for outstanding or^unsatisfactory rat- 
ings. Consulting teachers must demonstrate their ability to relate 
observed behaviors to competence ratings. 

Lake Washington trains evaluators in the same teaching principles 
that guide teacher staff development. This training enhances the 
correlation between the evaluators* judgments and the standard of 
practice adopted by the district. To the extent that the ITIP principles 
themselves are valid indexes of teacher competence, this training 
enhances the validity of the teacher evaluation process. It creates a 
common language among principals, teachers, and trainers. The com- 
mon training and resultant shared language allow evaluators to com- 
municate their -observations and assessments with concrete, readily 
understood referents. 

Salt Lake City enhances validity indirectly by referring decisionmak- 
ing to a committee containing two experts. The validity of evaluation 
judgments rests f Xi -he consensus of the committee. The presence of a 
learning specif Ir. id a teacher from the relevant subject area or 
grade level on > committee increases the prospect that defensible 
inferences about teacher competence are made. 

The evaluation of minimal competence, based on periodic observa- 
tions of classroom performance, attends to the effective control of 
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students and to the presence of certain teaching benaviors. These 
behaviors relate tb planning, setting objectives, teaching a lesson 
related to the objectives, and evaluating whether the objectives have 
been attained. These low-inference variables suffice for judging 
minimal competence, and a moderately skilled observer can judge 
them. 

This type of evaluation does not address pedagogical knowledge and 
judgment. Pedagogical knowledge and judgment relate to the appr9pri- 
ateness of teaching objectives for meeting certain goals or for different 
types of students, the relative effectiveness of alternative strategies for 
presenting particular types of content, the relationship among lessons 
taught throughout the course of a week, a month, or a semester, the 
variability of teaching techniques, the theoretical soundness of content 
and strategy decisions, and the depth of subject mattejr; knowledge pos- 
sessed by the teacher and imparted to the student, ' " * 

The evaluation of minimal competence also treats neither creativity 
and innovation in teaching noi; student motivation beyond the ability 
to induce compliance with work requirements. Furthermore, it ignores 
the multiple, long-term consequences for students of the overall class- 
room experience, such as continued enthusiasm for learning, broaden- 
ing of learning styles, the ability to apply concepts or developed skills 
to diverse situations later on, and increased self-confidence. In short, 
evaluation .for judging minimal competence attends to the form rather 
than the substance and to the immediate rather than the long-term 
effects of teaching. ' 

Evaluation of Degrees of Competence 

Evaluation for judging relative competence must/fake into account 
the probable multiple- short- and long-run consequences of teaching 
behaviors and the substantive basis for teaching judgpaents. This type 
of evaluation depends on high -inference variables, e.g., how well (Joes 
the teacher plan, within and acrbss lessons, to impart- the structure of 
knowledge in the discipline, to ^account for the students' levels of 
development and prior learning, and to achieve the immediatte and 
long-range goals of instruction?^ How well do the teacher's strategies 
and techniques meet the changing needs of students oyer time, 
integrate different objectives, and foster the devdopment, application, 
and transference of student skills and abilities? • These high-inference 
variables require the judgment of an expert observer. 

Three basic characteristics of an evaluation process designed to 
judge minimal competence^ may limit its validity for judging relative 
competence above a minimal level: These limitations stem from this 
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expertise of the evaluator, the format of evaluation, and the evaluation 
criteria. 

A generalist evaluator trained in evaluation techniques can ascertain 
the presence or absence of minimal teaching competencies in a few 
visits. Thus^. principals can make defensible decisions about whether a 
teacher should or should not be placed on probationary status. Class- 
room management problems are difficult to hide, even on a prear- 
ranged observation day and even if students tend to behave better for 
their principal. Gross ineffectiveness in communication also is hard to 
disguise. 

Under this method of evaluation, however, the principal, who is 
usually a generalist, cannot assess subject area competence and the 
quality of ongoing classroofm activities. The kind of sophisticated, 
knowledge-based assessments required for valid ratings beyond satis- 
factory demands an expert in the teaching area of the evaluatee. 

Furthermore, relative competence cannot be assessed solely on the 
basis of a f^^w discrete classrooln observations. The format of evalua- 
tion must reach beyond obseryed teaching behaviors on a given day or 
days. The quality of ongoing cjlassroom activities depends on how what 
happens today relates to whai happened yesterday and last week, as 
well as what will happen tomorrow and thereafter. 

Because the internal coherence arid integrity of teaching form a con- 
tinuum, the evaluation of relative competence requires a more holistic 
set of data about teaching activities than can be gleaned from teacher 
performance during a few classroom observation visits. It requires a 
longitudinal assessment of teacher plans, classroom activities, and stu- 
dent performances and products. 

Greenwich is distinguished by its emphasis on evaluating degrees of 
competence as it seeks to help teachers improve their performance. 
The validity of Greenwich's process rests on its ability to appropriately 
diagnose individual teacher's needs and to accurately gauge progress 
toward more competent performance in the areas so identified. 
Evaluation for improvement allows for individualized applications of 
teaching criteria, because teacher needs are personal to the teacher and 
individual to the classroom context. Thus, the Greenwich process has 
the capacity to help a teacher develop throughout his or her career. 

Unlike processes for evaluating minimal competence, the Greenwich 
process continues to be relevant as the teacher acquires the ability to 
make professional judgments.. Although the substitution of specialists 
for generalist evaliiators would improve the process, mutual goal setting 
* helps it to remain relevant for experienced teachers. 

Some criteria represented in the Greenwich Guidelines for Profes- 
sional Performance can even guide the judgment of excellence. The 




56 



guidelines Include such criteria as: "uses instructional techniques that 
are current, resourctfj| and challenging"; "recognizes differences in 
capacities and interests of blu'^^^nib"; "enriches the daily program 
through a variety of interests"; "showb understanding, interest, and 
concern for students' emotional, social, and physic?! characteristics"; 
"develops in students a respect for learning [and^ a consideration of 
the rights, feelings, and ideas of others"; and "seeks to understand dil*- 
ferent sides of a question." 

In designating teachers as outstanding in their year-end assessment 
reports, Greenwich principals ensure that these designations may be 
justifiably inferred from the evaluation reports. This documentation is 
intended to protect outstanding teachers from possible reductions .in 
force. The validity of these judgments for this use has yet to be tissted. 
The outcome will prove informative. 

Given current public interest in diversifying the uses of teacher 
evaluation for personnel decisionmaking, we discuss below the poten- 
tial validity of these processes for other types of personnel decisions. 
Indeed, all of these districts make differentiated staffing decisions 
when they select senior teachers, teacher leaders, consulting teachers, 
ITIP trainers, peer advisers, and so on. Yet none uses its teacher 
evaluation process for selecting these teachers. Why not? 

These districts use committees of teachers and .administrators to 
select differentiated staff on the basis of administrator and peer recom- 
mendations.' In Toledo, for example, the Intern Review Panel selects 
consulting teachers who have been recofnmended for their teaching 
excellence, creativity in teaching, school leadership, self-confidence, 
ability to handle emergencies, ability to generate ideas and solutions, 
and human relations skills. 

Peer advisers in Salt Lake City are nominated by principals and 
teachers and selected by a committee on the basis of their teaching 
ability, interpersonal skills, and discretion in dealing with peers, stu- 
dents, parents, and administrators. Senior teachers in Greenwich are 
self-nominated and selected by a committee of school-level administra- 
tors and teachers on the basis of their ability to express themselves and 
to motivate students, as well as their subject-matter knowledge. 

Although the screening processes for these positions would likely 
eliminate teachers who had received poor evaluations, the evaluation 
processes would not provide recommendations for, .this special status. 
First, the evaluation processes do not produce the kinds of information 
about teaching competence that would be needed to differentiate 
between good, better, and outstanding teacher*. But more important, 
the differentiated staff roles require a wider range of talents than those 
exhibited in the classroom. 
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In sum, the evaluation processes of HibbC four districts do not suit 
the selection of differentiated staff. Thus, LEAs seeking innovative 
personnel policies must decide whether their g'^al is to ?ivvard teachevs 
for classroom performance or to select teachers for leadership posi- 
tions. The former requires assessing degrees of competence; the latter 
requires more. 



UTILITY 

The utility of teacher evaluation depends in part on its reliability 
and validity, that is, on how consistently and accurately the process 
measures minimal competence and degrees of competence. The utility 
of evaluation depends also on its cost, that is, on whether it achieves 
usable outcomes without generating excessive costs. The results must 
be worth the time and effort used to obtain them if the process is to 
survive competing organizational demands. At least three types of 
costs—logistic, financial, and political — should be considered in assess- 
ing utility. 

Logistic costs: Evaluation procedures, if overly complicated, threaten 
utility- A process too cumbersome to provide timely results loses its 
utility. If procedural demands exceed staff capabilities, evaluation is 
implemented poorly and its results are not usable because they are not 
reliable or valid. A process that is too complicated or too time- 
consuming to be properly implemented has low utility where teacher 
organizations can block dismissal attempts on procedural grounds. 
Equally important, excessively complicated procedures dilute evalua- 
tion resources, making them less available for improvement purposes. 

Financial costs: As resources devoted to evaluation increase, so must 
the perceived, observable benefits of evaluation. If the financial costs 
of the process exceed its perceived benefits, utility suffers. Sooner or 
later, the system will commit less time and money to the process so as 
to accommodate other system demands, and the process will lose its 
utility. The evaluation process must be cost-effective enough to allow 
for a sustained level of effort over time. 

Political costs: Useful evaluation requires political acceptability. A 
process may be theoretically valid and reliable, but if it is not endorsed 
by those who control political power, the use of its results will lead to 
struggles that divert organizational energies uom system goals. Simi- 
larly, if the process undermines the abiHty of important constituents- 
teachers, parents, or administrators — to legitimately inflofence the 
teaching-learning environment, it will breed dissension or low morale 
that adversely affects the larger organizational mission. 
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Utility represents a proper balance of costs and benefits. The bene- 
fits include the provision of data for decisionmaking, better communi- 
cation, and personnel improvement. 

Data for decisionmaking: The evaluation process must produce data 
of sufficient quality and relevance that administrators, teachers, and 
others will use the information in making personnel and organizational 
decisions. 

Better communications: The evaluation process should promote 
communication among members of the organization. To the extent 
that it provides opportunities for disseminating organizational goals, it 
will help to maintain and improve the organization. 

Personnel improvement: To the extent that the evaluation process 
leads incompetent performers to depart and competent performers to 
improve, the quality of teaching and instruction will improve. 

The design and implementation of teacher evaluation processes 
depend on these aspects of utility. However, they are rarely considered 
in the literature, which treats issues of reliability and validity in isola- 
tion from real -world complexities and constraints. Many theoretically 
and technically sound evaluation systems fail in their implementation 
because they do not take into jiccouni the logistic, financial, or political 
realities that ultimately determine their usefulness. 

The evaluation processes in the four case study districts achieve 
higher utility than most, since their results are used, and the processes 
have proved cost-effective enough to remain viable (and relatively well 
implemented) over time. The components of .utility, though, are not 
identical across districts or stable over time. As politics shift and the 
context and purposes of evaluation change, the utility of a given 
approach fluctuates also. Below we discuss the utility of the four dis- 
tricts* evaluation processes. 

Toledo 

Judged in terms of its relatively narrow focus, Toledo's process has 
high utility. The intern and intervention programs succeed in assisting 
teachers to achieve acceptable teaching competence, or in removing 
them from the classroom if they do net. The process does both of 
these things without disrupting the system's operations or lowering the 
morale of school personnel. 

Three critical featureo ensure the utility of the Toledo process: (1) 
It is carefully managed, and it is conducted by evaluators who have no 
other, competing responsibilities; (2) it is focused and it uses limited 
resources to reach a carefully defined subset of teachers; and (3) it is d 
collaborative effort and it engages the key political actors in the design, 
implementation, and ongoing redesign of the process. 
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By giving consulting teachers released time and limiting the number 
of interns each evaluates, the process provides more and closer supervi- 
sion of teachers being evaluated and increases usefulness of evaluation 
for both individual interns and for decisionmakers. The process pre- 
cludes the all-too-common type of evaluation characterized by last- 
minute observation or no observation at all, poor documentation, and 
missed dea<llines. 

By focusmg the intern and intervention programs on two specific 
subsets of teachers needing special assistance, the process is cost- 
effective in a particular sense. Although the cost of supervising each 
intern or intervention teacher averaged about a relatively high $2000 
per supervised teacher in the first two years of the program's imple- 
mentation, the process showed, a relatively low overall cost and pro- 
vided substantial substantive and political benefits. 

The process ensures that only competent teachers enter the profes- 
sion and that incompetent teachers are rejected if they show no 
improvement. These are the basic aims of teacher evaluation. For the 
general public, as well as for the school system and the teaching profes- 
sion, a process that achieves these two complementary objectives has 
high utility. 

By targeting resources on teachers who most need supervision, the 
process provides a cost'-effective means of facilitating the organization's 
work. Inchoate efforts to handle the problems caused by a small 
number of incompetent teachers cause institutional confusion and 
divert considerable professional resources from instruction. Ifi such 
cases, the organization must deal with the results of the problem rather 
than its source, and school operations suffer. 

In contrast, a system that intensely supervised all teachers would 
waste valuable resources on many who did not require assistance; these 
resources also could be used more profitably for actual instruction 
rather than the monitoring of instruction. For accountability purposes 
at least, the Toledo intern and intervention programs have high utility: 
They achieve their goals without diverting resources from other aspects 
of the organization's mission. 

Finally, because the Toledo programs are a joint venture of union 
and management, the political climate for implementation is more 
positive than would otherwise be the case. A review board handles 
administrators' and teachers' concerns, and procedural mechanisms 
ensure carefully conducted, fair supervision. As a consequence, the 
results of the process and the process itself are not subject, to continual 
grievances. If the district terminates a teacher's contract, the union 
does not initiate proceedings against the district (it will, however. 
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represent a teacher who requests legal assistance). This positive politi- 
cal atmosphere contributes to the programs' utility. 

The small size of the intervention program also coi;lributes to the 
political acceptability of the process. While some might argue that a 
program involving so few teachers can have little effect on organiza- 
tional improvement, others hold that a program of broader scope might 
threaten organizational stability and morale. Toledo has balanced 
accountability and improvement needs by providing other voluntary 
vehicles for assistance that are not linked to personnel decisions. 

In sum, the intern-intervention approach has high utility because it 
effectively targets resources on a small but important aspect of teacher 
supervision. It does so with the full cooperation of nn.ion and manage- 
ment and with increasing acceptance and approval by school personnel. 

Salt Lake City 

Salt Lake City's remediation process also has fairly high utility for 
accountability purposes, although it seems to provoke more anxiety on 
the part of teachers than does Toledo's process. This anxiety may 
stem from the fact that the principal alone makes the decision to move 
a teacher'from the informal accountability process into the remediation 
process, and he makes the decision on grounds that are not uniform 
and prespecified. Or, it may stem from the fact that a teacher may be 
identified as a possible candidate for remediation because of a "review 
of service; " request initiated by anyone in the school community. 

The relative lack of standardization in evaluation prior to remedia- 
tion does not seem to have resulted in the assignment of the wrong 
teachers to remediation. Neither has it weakened the use of the 
remediation process for personnel decisions. Over a nine-year period, 
remediation resulted in the removal of 37 teachers from a force now 
numbering 1100. Nearly that number were successfully remediated. 

The financial costs of the remediation process are fairly low, since it 
relies in large part on the services of people receiving modest stipends 
or substitute pay. A four-member remediation team observes, advises, 
and evaluates a teacher for a period of up to five months, but the team 
members also have full-time responsibilities elsewhere. An additional 
teacher, drawn from among retired teachers or teachers on leave, may 
be hired full-time for up to, a month to help the teacher on remedia- 
tion. 

Furthermore, although teacher association leaders express some con- 
cern about the role conflicts inherent in the teacher evaluation system, 
they accept it in the context of shared governance in Salt Lake. Under 
ft:>Kared governance, teachers are members of the remediation teams and 
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participate in virtually every aspect of school operations. Teacher con- 
trol over curriculum decisions and i vrlvement in other teaching policy 
decisions indirectly enhance the utility of the teacher evaluation pro- 
cess by legitimizing its main purpose: to ensure that incompetent 
teachers are removed from the school system. 

Lake Washington 

The utility of Lake Washington's teacher evaluation process for 
identifying, assisting, and, if necessary, removing incompetent teachers 
from the classroom is fairly high. Over a five-year period, the proba- 
tionary process has directly resulted in four teachers leaving classroom 
teaching and in seven teachers improving sufficiently to be reinstated 
in the classroom on continuing contracts. The overall evaluation pro- 
cess has facilitated the counseling out of an additional 56 teachers, the 
placement of 21 teachers on leave of absence, and the nonrenewal of 
nine expired contracts. Lake Washington teachers and administrators 
agree that the process works fairly to facilitate personnel decisionmak- 
ing related to minimal competence. 

Despite Lake Washington's tough-minded approach to evaluation, 
the political costs have not yet proved unbearable. Most teachers con- 
sider the evaluators fair, equitable, and consistent. The ongoing train- 
ing provided to administrators has minimized teacher perceptions of 
individual evaluator bias. Union representatives have said that "if an 
administrator uses the procedure correctly, we are not going to be 
against them." 

The financial and logistic costs of this process are large. Of the four 
districts in' the study. Lake Washington invests the greatest amount of 
resources in teacher evaluation, particularly because its staff develop- 
ment expenditures must be included. (Greenwich also makes a major 
investment in staff development, but its operation is separate from 
teacher evaluation.) 

Lake Washington elementary school principals spend an average of 
26 percent of their time on' evaluation; secondary administrators spend 
some 15 percent. The staff development budget was increased to $1 
million 10^^3-1984. This allocation finances the ITIP program for 
the c^. ,velopmoat of the instructional ability of teachers, other in- 
ser'ice training for teachers, individual teacher and school staff 
development, and administrator training in a variety of areas, including 
clinical supervision skills. 

These expenditures of time and money produce visible benefits. The 
ITIP precepts that guide staff development for principals and teachers 
bring cohesiveness to an activity that is usually fragmented and erratic 
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and helps teachers and their supervisors to idem fy and clarify prob- 
lems. The ITIP framework also gives them tools and a common van- 
tage point for developing pragmatic sol utions. T h e invest ment in staff 
development thus increases the utility of teacher evaluation. 

However, the highly specified, time-consuming, and cumbersome 
procedures for evaluation decrease its utility in two ways. First, the 
procedures discourage the use of probation in many instances where 
both teachers and administrators feel it is called for. Second, they 
leave little time for attention to the needs of competent teachers. 

The probationary procedures prescribed by state law consume con- 
siderable time. District practices require additional time. Principals 
must continually assess teacher response to their personal growth plan, 
and they must observe and meet at least once a week with the proba- 
tionary teacher. This enormous investment of time conforms to the 
district philosophy of doing everything possible to help a teacher 
improve. 

The enormous investment of time also means that, regardless of the 
actual state of teaching in a school, principals believe that they can 
deal with no more than one teacher on probation at a time. A number 
of principals frankly admit that they are often forced to transfer inef- 
fective teachers rather than to place them on probation. Lake Wash- 
ington teachers also believe strongly and with surprising consistency 
that the system tolerates incompetent classroom performance. 

The procedural requirements for teacher evaluation in Lake Wash- 
ington, which emanate largely from the state law, prevent district 
administrators from devising a more productive evaluation strategy. 
District teachers end administrators believe that teacher evaluation 
requires differentia':ed practices to reflect ter'cher skill and needs. The 
utility of the evaluation process is reduced by the need to minimally 
evaluate all teachers for the same amount of time every year, as the 
state requires. This procedural uniformity results in pro forma evalua- 
tions in many cases and lack of special attention to excellence, and it 
prevents administrators from directing evaluation resources where they 
are most needed. 

As the overall quality of teaching in Lake Washington has improved, 
the need for differentiated teacher evaluation has increased. The 
decreased utility of the current process stems in part from the rigidity 
of its procedures in the face of changing purposes and needs. 
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Greenwich 

The Greenwich teacher evaluation system requires that every 
teacher ngage in goal setting, consultation, observation, and evalua- 
tion every year. To permit adequate time for teacher evaluation, 
teacher leaders are assigned to schools to maintain a ratio of about one 
evaluator to 20 evaluatees. We estimated that principals would have to 
spend 9 percent of their time to minimally meet the demands of the 
process. The time spent by principals, other building-level administra- 
tors (assistant principals, certain department chairpersons), and 
teacher leaders represents a major commitment of resources. 

The major goal of the system is to improve teaching. Unlike the 
other three districts, where the departure of some incompetent teachers 
presumptively raises the quality of teaching, Greenwich cannot as 
easily quantify the effects of its system. Greenwich annually surveys 
its teachers about their perceptions of the fairness and utility of the 
teacher evaluation process. About half report that the system helps 
them improve their teaching performance. While reports of improved 
performance do not always mean improved performance, they may 
indicate feelings of efficacy that ultimately improve performance.^ 

The use of joint goal setting and teacher self-evaluation (along with 
administrator evaluation) increases the likelihood that teachers will 
find the process useful. Although many evaluation systems use goal- 
setting procedures, they do not always specifically address the teachers* 
own immediate concerns, classroom situations, and areas in which 
there is already a felt need lOr improvement. 

The Greenwich system not only enables the school system to engage 
the individual teacher, it does so in a manner that relates directly (or 
at least should relate) to the teachers' daily professional endeavors. 
Thus, the utility of the Greenwich evaluation process results from its 
ability to tap teacher motivation and desire for self-improvement and 
to reward teachers' efforts by acknowledging their importance. 

The ability of the evaluation process to provide this stimulus for 
improvement justifies its financial and logistic costs. However, to the 
extent that the process loses its relevance to many teachers, its expen- 
ditures of time and effort produce less and its utility diminishes. The 
current trend to replace teachers* personal goals with system goals may 
be having this effect. 

The Greenwich teacher evaluation system is not designed to serve 
accountability purposes. However, Greenwich is currently trying to 

^Other research suggests that teachers typically do not attribute positive^ effects on 
their teaching to teacher evaluation processes (see, e.g., Natriello and Dombusch, 
1980-1981), 
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Standardize the process and the criteria ibr evaluation so that it can 
use the results of evaluation as a basis for individual personnel or job 
status decisions. Although we found little evidence that the process 
has been used for personnel decisions, many teachers and some 
administrators believe that its use for that purpose conflicts with its 
use for improvement. Teachers and some evaluators are selecting 
meaningful personal goals more cautiously, thus reducing the value of 
the system. 

Politically, the Greenwich teacher evaluation process has cost little 
and provided few benefits. The process does not produce the kinds of 
tangible outcomes that have great meaning to the public. While the 
perceived benefits to the school system have sufficed to support addi- 
tional resources for evaluation (primarily in the form of teacher 
leaders) over many years, the perceived utility of evaluation has not 
sufficed to protect evaluation time against other organizational 
demands. Boih principals and teacher leaders complain that other 
administrative duties reduce the time available for teacher evaluation. 
In assigning teacher leaders other administrative functions, the 
Greenwich educational authorities appear to have somewhat devalued 
the evaluation function. 

The political costs of the evaluation process may be expected to 
increase if and when results are used for personnel decisions. If the 
process is well enough adapted to this new purpose and resources are 
increased to meet reliability demands, the political benefits of the pro- 
cess may also increase. Teacher support for the evaluation system will 
depend upon how well it continues to fulfill its traditional purpose as 
well as its ni^w objectives. 

In sum, the utility of a teacher evaluation system depends on how 
well and how fairly it measures what it seeks to measure, whether the 
school system can and will tolerate its logistic and financial costs, and 
whether it functions so as to be acceptable to the relevant political 
forces. The utility of a given approach changes as the politics, context, 
and purposes of evaluation change. 

Toledo will, in a few years, have to hire many more new teachers. 
With that, the cost of the intern program as currently implemented 
will rise substantially. Will the program survive? Will Salt Lake's 
unusual shared governance and nonstandardization survive without the 
current superintendent's leadership? Will Lake Washington, which 
has attained prominence tlirough standardization, find that its pro- 
cedures must give way to a differentiated evaluation strategy? How 
will Greenwich resolve its competing demands for professional growth 



and accountability 
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The utility of a specific teacher evaluation approach will vary over 
time. School districts, we suggest in the final section, should proceed 
analytically. 
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V. CONCLUSIONS AND RECOMMENDATIONS 



We undertook this study to find teacher evaluation practices that 
produce information that school districts can use for helping teachers 
to improve and/or for making personnel decisions. We described in 
this report four evaluation procedures that achieve these primary objec- 
tives. 

Our conclusions and recommendations constitute a set of necessary, 
but not sufficient, I'onditions for successful teacher evaluation. Educa- 
tional policies and procedures must be tailored to local circumstances. 
Our conclusions aud recommendations, therefore, may be best thought 
of as heuristics, or starting strategies to be modified on the basis, of 
lucal experience. 

One: To succeed, a teacher evaluation system must 
suit the educational goals, management style, conception of teach- 
ing, and community valueo of the school district. 

As obvious as this conclusion may appear, the educational landscape 
is nevertheless littered with the remnants of unsuccessful procedures 
prc^uced by bygone fada, administrators, and policies. The procedures 
faiii:d— that is, lost their relevance and ceased to be faithfully 
implemented— in part because they did not serve the school system's 
more fundamental operating assumptions. 

In each oi* the study districts, the teacher evaluation system worked 
as intended because it matched the fundamental operating assumptions 
of the districts educational goals, management style, conception of 
teaching, and community values. Where a district's ethos and operat- 
ing assumptions were changing, we saw evidence of strain in the imple- 
mentation of teacher evaluation. 

This conclusion suggests that a school district that values uniformity 
of instruction aiid emphasizes standardized testing as the measure of 
goal attainment should not adopt a teacher evaluation process that 
allows multiple definitions of teaching success. A district that values ^ 
multiple outcomes, of teaching and learning should not use standardized 
test scores for evaluating teachers. 

A highly centralized^ bureaucratic district should probably not adopt 
a teacher evaluation process tliat allows is.dividual teachers corset their 
own goals; a highly dccantralized district should probably not. use an 
evaluation proceas tliat stresses adherence to centrally determinec. 
goals and uniform curricular objectives. A district that wants teachers 
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to take responaibility for their own professional development should 
pr^nably use teachers as well as administrators as evaluators. 

A district in which management values predominate probably ca- 
for long delegate evaluatJo;) responsibility to a teachers' associatic 
to individual teachers. A district with a strong teachers' associat 
probably cannot juefuUy use a traditional hierarchical approach to 
evaluation. 

Based on the ooi/clusion that a teacher evaluation system is more 
likely to succeed if it suits a district's fundamental operating assump- 
tions, v/e recommend; 

1. The school district should examine its educational goals, 
managemont fttyle, conception of teaching, and community, 
values and adopt a teacher evaluation system compatible with 
them. It sl\ould not adopt an evaluation system simply 
because that system works in another district. 

2. Stat^b should i>ot impose highly prescriptive teacher evalua- 
tion retiuiremfciits. 

' Co;iclusion Two: Top-I'eval commitment to and resources for 
evaluation outivcij^h checklists and procedures. 

This simple conclusion may be the most important of the study. 
Succe^aful teacher evaluation demands commitment and resources. 
The top leader ot the school administration and/or the teachers' orga- 
nization must commit themselves to evaluation, and the school district 
must translate their commitment into resources. Without comniitment 
and resourcoc- and the activities that they stimulate, teacher evaluation 
becomes a forma], meaningless exercise. 

Some educators believe that- good teacher evaluation requires no 
more than finding the right checklist. They collect and compare forms 
and choose one. Then they discuss such relatively miner details as 
whether the evaluator must spend an entire class period pbserving or 
whether the teacher should have advance notice. 

We found that the form and procedure of the relatively few success- 
ful teacher evaluation systems vary little from those of the less success- 
ful systems. The successful ones are, however, distinguished by their 
seriousness of purpose and intensity of implementation. Many school 
districts evaluate teachers solely to comply with state law or regulation; 
others, fsolely to respond to community sentiment. Under these 
.circumstances— which are more prevalent than most will admit— 
evaluation requires nothing more than formal compliance and minimal 
resource commitment. This approach cannot produce successful 



teacher evaluation, becauwo it does not integrate evaluation into 
decisionmaking or give it priority. 

Since evaluation is both a difficult and inherently uncomfortable 
activity, it needs explicit mechanisms to make it important — that is, to 
ensure that it receives high priority. Without s\icli mechanisms, 
evaluatora tend to put it aside for more immediate, and perhaps less 
oiierouy, demands on their time. When evaluation is not given prior- 
ity, its quality and intensity are reduced and its results cannot be used 
for personnel decisionmaking or improvement purposes. 

For school districts to obtain the commitment and resources needed 
to make evaluation important and useful, we recommend: 

3. The school district should give evaluators sufficient time,' 
unencumbered by competing administrative demands,, for 
evaluation. ' This may mean' 'assigning staff other than the 
school principal to some evaluation functions. 

Time is the mail! resource for teacher evaluation. Evaluator?: need 
time to make reliable and valid judgments and to offer assistance. 
Administrators and teachers who evaluate other teachers must not 
have urgent competing responsibilities that take precedence over 
evaluation. 

The school district must create an incentive structure that 
encourages ai.d allows ev duators to evaluate thoroughly. That is, hav- 
ing mandated teacher evaluation, the district must provide time for 
evaluation. It must create time either by giving evaluation a higher 
priority than that of competing responsibHities ov by assigning addi- 
tional evaluators. All of our case study districts solved this jDroblem by 
assigning expert teachers to some aspect of the evaluation process, par- 
ticularly to providing morp intensive supervision to teachers most 
needing assistance. 

Having allocated the vv'/e, the district must take steps to ensure 
that evaluators use the time well. For this purpose, we recommend: 

4. The school district should regularly assess the quality of 
evaluation, including individual and collective evaluator com- 
petence. The assessments should provide feedback to individ- 

^ ual evaluators and input into the continuing evaluator training 
process. 

The district must review evaluations uoth to increase their reliability 
and to ensure their timeliness. The evaluation of teachers with whom 
the evaluators must continue to work may create conflict. Evaluators, 
particularly principals, face competing considerations: On ti\e one 



85 



69 



hoib\ they may want to overrate teachers so as to preserve harmonious 
working relations in the school. On the other hand, they may want to 
deal with the unpleasantness associated with teacher evaluation by 
defening evaluation. 

The district must therefore reinforce evaiuators to conduct reliable, 
valid, and timely reviews as part of it'3 strategy for creating a proper 
incentive structure. Reinforcement may take the form of evaluating 
principals (and other evaiuators) on the basis of how well they evaluate 
teachers, creating a central office position or committee to oversee 
evaluation reporting, and/or developing a formal mechanism for moni- 
toring and periodically revising the evaluation process. 

Because teacher evaluation is a judgmental rather than a scientific 
process, it must be conducted fairly.' This means that evaiuators must 
share a common understanding of the process, its implementation, and 
the assumptions on which its reliability and validity rest. Moreover, as 
time passes, the actual implementation of the evaluation process may 
change as experience grows. Thus, the nature and quality of imple- 
mentation must be monitored. Evaiuators need regular, periodic 
opportunities to share their understanding of the purpose and process. 
Therefore, we recommend: 

5. The school district should train evaiuators in observation and 
evaluation techniques, including reporting, diagnosis, and clin- 
ical supervision skills, when it adopts a new teacher evaluation 
process. 

Furthermore, a shared understanding of the criteria on which judg- 
ments of teaching are made must be developed and maintained by pro- 
viding continuing opportunity iS for evaiuators to discuss the teaching 
assumptions underlying evaluation criteria and to review actual. evalua- 
tions with each other and their superiors. The content of evaluator 
training (and, indeed, the choice of evaiuators) must, suit the major 
purposes of evaluation. 

Although we consider checklists and procedures less important than 
commitment and resources, we nevertheless advise districts to pay 
attention to them. These technical details focus discussion. In the 
process of agreeing on evaluation form and substance, evaiuators 
develop a mutual understanding about teaching in their district and a 
« common language of analysis and interp etation. Evaluation provides 
one opportunity to establish and communicate a philosophy of teach- 
ing. This philosophy may iDvolve not only training, administrative 
leadership, and resource allocation, but also the details of what makes 
good teaching. 



Conclusion Three: The school clL'itnd iniuit decide the main pur 
pose of its teacher. evaluation system and ihen match the process ' 
the purpose. 

Teacher e^ aution serves multiple purposes, and a school district 
may be tempted to try to serve all of its purposes with one set of 
evaluators, using a single instrument and, a single process further- 
more, a district may not want, for political reasons, to say that its goal, 
is helping all teachers to improve if this means that it will appear to be 
rejecting the elimination of incompetents as its main purpose. Con- 
versely if its main purpose is eliminating incompetents, it would not 
want to seemingly reject helping all teachers to improve. Many dis- 
tricts, therefore, publicly proclaim that they are addressing all con- 
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rns. 

With the new interest in merit pay and master teachers, we may 
assume that many school districts will try to use one evaluation system 
for both traditional and these new purposes. Yet, most of the litera- 
tur« questions whether a single evaluation system can handle both for- 
mative (improvement-oriented) and summative (decision-oriented) 
evaluation. It suggests that decision-oriented evaluation would intimi- 
date rather than help teachers and that improvement-oriented evalua- 
tion produces data unsuited to personnel decisions. This explanation, 
while correct as far as it goes, fails to fully explain the dynamics 

Our case studies reinforce the conclusion that a single teacher 
evaluation process can serve only one goal well. Sometimes an aspect 
of a process can serve both decisionmaking an(^ Improvement purposes 
for a small subset of teachers (e.g., in a remediation program); however 
a single process cannot meet the goals of judging and improving aH 
teachers. The reasons for this become clear when we examine the 
demands associated with several evaluation purposes. ^ „ ^ , 

Evaluation for improvement, if it is to meet the needs of ^11 teach- 
ers, must be flexible, for; like individualized instruction, it must take' 
PHch teacher where he or she is and help him or her improve It must 
encourage teachers to develop. Criteria must be broad enough and rat- 
ing i^cales must have sufficient range to accommodate all. - 

To b" helpful to the teacher, the evaluation process must take into 
account the specific teaching context. The outcome of the process is 
advice to tht teacher. It is not important, indeed it is not necessary, 
possible, or re rUistic, for school administrators to expect to be able to 
rompare teachers uncl,>r lais type of evaluation. The flexibility needed 
to provide useful personalized advice to a teacher precludes compari- 
sons or rankings of teachers. I.' the purpose were narrowed to helping 
only those who are ■udged to need it, the process would begin to 
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acquire some of the charucteriHtics, UHSociated with other purposes 
which, because they compare teachers, require a higher order of reli- 
ability and a different kind of validity. 

Evaluation for the possible termination of employment has different 
requirements. The criteria and the ratings must be desigiied to allow 
decisions about minimally acceptable teaching behaviors The evalua- 
tion task is to 'distinguish competent from incompetent chers. The 
basis for this distinction must be clear. Hence, the school district must 
specify the criteria, behavioral bases for ratings, and procedures. The 
bureaucratic demand is for a common scale on which all teachers may 
theoretically be compared, but the real need is for a list of teaching 
behaviors that all teachers except the incompetent will exhibit. In 
practice, this means that judgments typically rest on assessment of 
generic teaching skills. ^ "* 

The use of generic teaching skills as the basis for evaluation implies 
that the evaluator need riot know much about the subject matter and 
grade-level pedagogical demands. Thus, a generalist principal can 
evaluate all teachers under his or her jurisdiction. Presumptive fair- 
ness means that the principal can observe all teachers for relatively 
short periods of time, noting that most teachers have the minimal 
skills but that the incompetent do not. Having made this determina- 
tion, the principal (or district administration) may then concentrate 
evaluation resources on those who may be judged incompetent. 

To spend substantial evaluation resources on all teachers in this 
approach would be wasteful since, by virtue of the focus on minimum 
skills (skills that, by definition, most teachers have), the process is 
irrelevant to the needs of most teachers. The school district can con- 
centrate evaluation resources on helping the probationary teacher to 
master the minimum skills or, if this help fails, on making the final 
judgment of incompetence. It can offer personalized assistance using 
context-specific appli': itions of the teaching criteria for improvement 
or remediation. 

The final determination of incompetence, however, must be seen to 
be reliable. The probationary teacher must be judged by standardized 
indicators. Multiple samples of the teacher'^ behavior must be taken. 
In sum, the judgment must be reliable enough to stard up in a court of 
law, wher* a termination decision might be appealed. 

Improvement and termination pose diffp? rt evaluation demands. 
They require trade-offs between breadih and depth o^ coverage and 
between standardized and context-specific notions of acceptable, good, 
and better teaching. Bureaucratic and external public demands differ. 
The failure to clarify the purpose, or to match the process to the pur- 
pose, may undo the effectiveness of a t3acher evaluation system. The 
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caHe study districlH explicitly or implicitly "uidc their choices. If a 
school district wants to serve more than one purpose, it may need to 
estahlish more than one process. We recommend: 

6. The school district should examine its existing teacher evalua- 
tion system to see which, if any, purpose it serves well. If the 
district changes the purpose, it should change the process. 

7. The school district hould decide whether it can afford m^yre 
than one teacher evaluation process or whether it must choose 
a single process to fit its main purpose. 

Although our study "as restricted to school districts that used 
teacher evaluation for individual improvement and personnel decisions, 
we believe that some of. what we learned applies to teacher evaluation 
for other purposes, such as decisions regarding merit pay or master 
teachers. Decisions that involve pay and promotion and publicly dif- 
ferentiate among teachers usually receive, a high level of scrutiny and 
therefore require procedures that all parties perceive as reliable and 

valid. J • ■ 

The award of merit pay, while not as serious as a dismissal decision, 
nevertheless has visible consequences: It will label some teachers meri- 
torious and others, by default, unmeritorious. The latter group will 
then '^f*-^ to scrutinize the process, especially when every tear her is 
<>valut. avery year.^ Thus, the award of merit pay establishes the 
n •e'^ fi r nt,or teacher evaluation to sustain the cudi'mlity of the 
i>r ^:fc^.o -a rigor that approaches the level required for dismissal deci- 
•on-^. 'vchool district that intends to evaluate all teachers annually 
• i merit pay dfrcipions must commit substantial resources to evalua- 
tion. 

Teacher evaluation to sustain master teacher appointments requires 
a somewhat smaller commitraent of resources than that for merit pay. 
The evaluation process still demar^^-s rigor, but it will affect a f^maller 
percentage of teachers in any given year. Thus, the school system will 
be able to concentrate its evaluation resources. 

If the school districtJ^intends to consider most teachers for either 
merit pay or master teacher status (after a few years of experience), the 
evaluation system may resemble the system for termination; it need 
identify only those few to be denied merit pay or promotion. However, 
if only a fraction of teachers are to receive merit pay or master teacher 
status, the demands for reliability, validity, and public defensibility 
increase significantly. 

iSome might argue that the award of merit pay could or should be kept confidential. 
Such a policy does not seem likely in the freecom-of-infor. .ation era. 
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Kviiluatioii tor torinination inunt reliably diHtinguish between inade- 
quate and minimally adequate teachers; evaluation for excellence must 
reliably distinguish between marginally excellent and merely highly 
competent teachers. However, whereas a standard list suffices to dis- 
tinguish low levels of competence, distinguishing among high levels of 
competence requires multiple criteria, expertly evaluated. 

Eycellent teaching is, by definition, rare; it k distinguished by judg- 
;nent. intuition, insight, creativity, improvisations, and expressiveness. 
While criteria and scales can be devised to measure these intentional 
behaviors, evaluating their presence requires reliability; unreliable 
results are likely to be challenged. 

The evaluation of excellent teaching, we believe, requires judgments 
by experts rather than by generalists. Whereas principals can evaluate, 
foi > erforman^^c improvement (where the need for reliability is rela- 
tively low) and can evaluate for termination decisions (where the cri- 
teria are the least common denominators of teaching), the judgment of 
excellence requires an expert. Excellent teaching, we submit, cannot 
be judged in the abstract as is generic teaching competence. To judge 
excellence, an evaluator must know the snbject-matter, grade-le/ .l, and 
teaching context of the teacher bei ng eva^cat ed. 

In other words, skilled mathematics teachers are needed to judge 
excellence in mathematics teaching. Ekilled elementary school teach- 
ers are needed to judge excellence in elementary teaching, and so on. 
Moreo/er, the evaluation of excellence calls for multiple samples of the 
teacher's behavior eitiier by the same expert or by severfiil experts. The 
dual requirements of exprrtness and reliability demand a teft'^er 
evaluation pr.^cess based oa either peer (or, more likely, master 
teacher) review or review by subject-matter supervisors. 

Conclusion Four: To sustain resource commitments and political 
support, teacher evaluation must be seen to have utility. Utility 
.depends on the efficient use of resources to achieve relicdmity, valid- 
ity, ai.d cost-effectiveness. 

For a teacher evaluation system to be useful to the district and cred- 
ible to tea'^hers administrators, and the community, it should offer a 
plausible solution to the major perceived problems or needs of the 
teaching fo oc We saw in the case study district? that al! participants 
supported (or ai least accepted) the teacher evaluation systenis. For a 
system to take hold and last, it must earn and retain the suppo''" of all 
participants. All yirticipanis are mon^ \'\ke\y to support a system that 
meets their needs. 

The selection of a teacher evalration system depords in part on the 
composition of the existing and Ointicipated teaching force. A district 



90 



that will not be hiring; Ini a decade needs an evaluation process that 
suits an experienced stHi. A district with increasing' enrolhnents or i\ 
teachin[^ force rapidly i;i)i>ioachin|j; retirement may need an evaluation 
process \hat will improve hiring'. A contracting district may need to 
consider performance-based reductions in force, A district with an 
even distribution of age and experience might choose an evaluacion 
system that differs Irom one that might be usei' where age and experi- 
ence are clustered. 

The selection ol a teacher evaluation system (iopends to mi even 
greater extent on the perceived quality of the teacliirig force. The com- 
position of the teaching force and perceptions of its quality determine 
which problems and needs the district should try to solve by teacher 
evaluation: general improvement, improvement of certain categories of 
\<^a?hers, identification of incompetence, assessment of relative com- 
rK'ience, induction of new teachers, retention of more experienced 
t'jachers, rewarding outstanding performance, or selection of master 
teachers. If the district chooses a teacher evaluation system that 
addresses its needs, all involved are more likely to consider the evalua- 
tion system worthwhile. 

The utility of teacher evaluation is difficult to assess. School ilis- 
tricts do not keep their books so as to permit the calculation of the 
true cost of teacher evaluation. While some school districts earmark 
funds for teacher evaluation or staff development, these funds do not 
usually cover the cost associated with the time of those involved in the 
evaluation process. 

The effects of teacher evaluation may be assessed in terms of, say, 
the cost of terminating an ineffective teacher s appointment or the per- 
centage of teachers dismissed because of poor teaching. But, some of 
the most important effects may be indirect. Does the community 
believe that the school district is doing something about incompetent 
teachers and teaching? Does the school district have a mechanism for 
communicating its expectations to teachers? Are good teachers being 
recognized and reinforced? The answers to these and other questions 
may contribute to perceptions of the utility of a teacher evaluation sys- 
tem. 

In the end, a school district considering whether to adopt a particu- 
lar teacher evaluation system (or whether to eliminate one) must assess 
whether it is worth the cost. Do the results justify the human 
resources invested in. it? The answer to this question depends on 
administrators', teachers', and the public s perception of the quality of 
the teaching force and the contribution that the teacher evaluation 
process makes to teaching quality. While the perception of each group 
to some extent reflects the group's interest, all are more likely to share 
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a common perception of utility if the process nchieven what, it sets out 
to achieve. 1\) increase the hkehhood of perceived utihty, we recoui- 
mhmhI: 

8. The school district nuist aUocate resources commensurate with 
the numher of teachers to he evakiated and the importance 
and visibility of evaluation outcomes. 

This recommendation extends our third recommendation: that dis- 
tricts should provide sufficient time for evaluation. Despite the obvi- 
ousness of l)oth propositions, most scliool districts fail to provide 
resources commensurate with the scope of their evaluations. The 
results tnerefore lack reliability, validity, and utility. 

Many school systems review all teachers annually. Two bureau- 
cratic phenomena encourage universal annual review. First, bureaucra- 
cies, especially public ones, must at leasf: appear to treat all employees 
(and clients) alike. If a school district wants to evalua* ^ome teachers, 
then it must evaluate all teachers so as not to dir,c -ninate. Second, 
teachers' associations want to prevent school adminiotrators from sin- 
gling out individual teachers for punitive evaluation. Hence, they often 
insist, through the collective bargaining process, that all teachers be 
evaluated annually. 

The annual review of all teachers usually produces perfunctory 
evaluations, because evaluation resources (chiefly, the time of the prin- 
cipals and other evaluators) liave been diluted to meet the formal 
requirem'^nt. Since many participants do not believe that the require- 
ment Ifiads to decisions, they do not press school systems to invest suf- 
ficiently in the process. The circular result is superficial evaluation 
that is not considered sufficiently reliable and valid to be ir ed. 

When pressed to improve teacher evaluation practices, school dis- 
tricts typically do not increase the ratio of evaluators to teachers but 
instead exhort principals to improve and increase evaluations. The 
supposedly enhanced process, while possibly occupying more time, still 
docs not produce usable results. 

Resource requirements depend also on the outcome sought. Results 
that decisively affect individual teachers demand a more thorough and 
reliable evaluation system than those that do not. Evaluation to help 
teachers to improve their performance, while important to teachers, 
does not affect them decisively. But evaluation used to terminate 
teachers' employment (or to make other teaching status decisions) has 
decisive effects. 

As we have seen, the dismissal of a teacher requires multiple obser- 
vations, extensive documentation, significant help to improve the 
teacher's performance, review of the decision at several levels, and due 
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process. Thv school (lintrict must l)o prepared to legally defciid tho 
dUuiiissal decision. Our next rcconunendation follows from these 
onerous resource re(iuirenients: 

9. The school district shouhl target resources so as to achieve 
real henefitH. 

Resuur(X»9 must go to the main evaluation purpose so that evaluation 
will be seen as cost-effective. The failure to concentrate resourcea will 
result in unfocused evaluation that consumes resources hut produces 
inforniation that servers neither teachers nor administrators. 

When evaluation may lead to dismissal, for example, the school dis- 
trict II iiMt c onsolidate resources to provide multiple evaluations, that is, 
one evaUmtor making multiple observations (for alccuracy); multiple 
evaluatbrs making one or two observations (for fairiiess); or multiple 
evaluators making multiple obaervatiQns (for accuracy and fairness). 
vThe failure to achieve accuracy and fairness wiP destroy the effective- 
ness of the teacher evaluation system. When costs are perceived to 
outweigh benefits, the process fails. 

Conclusion Five: Teacher involvement and responsibility improve 
the qu ility of teacher evaluation. 

The problems inherent in assigning the teacher evaluation function 
solely to principals came to our attention early in the study as we 
reviewed the literature and conducted our preliminary survey of school 
Histricts. Principals have a wide span of control miu little time fov^ 
evaluation, and they oRcn experience conflicts as they try to bajarlce 
their roles as school leader, supervisor, and builder of esprit de/fcorps. 
Furthermore, they do not have specialized subject-matter or pe/dagogi 
cal knowledge of all teaching areas in which they are expected t|) evalu- 
ate teachers. These lirnitations on the principal as an evaliLtor of 
teachers often seriously impair the effectiveness of teacher evaluation 
processes. \ 

All four, of our case study districts use master teachers in ^ome * 
aspect of the evaluation process (and in other staff development ac^vi-^ 
ties as well). Although we did not select these districts for case study 
specifically because they involved highly qualified teachers in , eValua\ 
tion, we are convinced that the use of peer levieiu-and/dfr peer assis- \ 
tance greatly strengthens these districts' pacify to supervise teachers 
effectiveiy-by-praviding additij^na^ exp';:rcise for this function. 

In addition, the teachers serving in various differentiated staff joles 
give theii- peers the kind of leadership and assistance that promotes the 
develop neat and dissemination of professional standards of practice. 
In each district, expert teachers provide curricular advice, classtoom 
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aMsistniu'f, and supervision both inside niul t)U(Mi(lc; (he (oachtir cvahia- 
(ion |)rori\ss. liulivichially aiul colh^ctivcly, Itiachers in tht'so districts 
j)hiv a mort' noarly prolVssional role than thoy do i.\ cHslriets that 
supervise aad iHreet teachers throufd^ bnr(^aui*ratie ehai nels. 

'Vhv involven\ent of l! * leaehors' or|'ani/".t ion in hi^ (levuh)pnn;nt 
and ovorsif^hl of te/a her evahuition and of other tei'^'hin;^ ijo^icies— 
also increases (he efi^ iM iveness of a^ I'v tuni pu.^ess. I^articuhirly 
in districts where enii ^vm' har^aininj.' has c()ntril)uted to the working 
conditions and the n*\l iie of the teaching force, union participation in 
desi^;nin}; and inii)lenient t-vahiation is a virtual preret|uisite for the 
aecei)tanee of evaluation rf\sults. 

When developing i tearher evaluation plan, a school district must 
consider issues ol h intimacy and protection in their political context. 
More important, th" implementation of teacher evaluation is itself a 
political process in which questions of credibility, due process, and fair- 
iieBs continually emerge in different forms. Collaboration between 
teachers and administrators in overseeing the implementation of 
evaluation can make the difference between useful evaluation results 
and stalemates. 

In all of our case study districts, the teachers* organizations have 
played an important role in the design and implementation of the 
evaluation process. Their participation takes various forms, such as 
involvement in joint oversight committees, union appointment of 
teachers who assist in the evaluation process, and consultation between 
top administration officials and union leaders. As a result, the evalua- 
tion processes have enough legitimacy to produce usable results. 
Rather than seeking to constrain administrators' exercise of their 
authority through procedural requirements, organized teachers in these 
districts participate, in varying degrees, in the decisions that affect 
teachers before these decisions result in grievances. 

Because the validity ai»d utility of teacher evaluation depend so pro- 
foundly on who conducts and oversees evaluations, we recommend that: 

10. The school district should involve expert teachers in the 
-vision and assistance of their peers, particularly begin- 
eachers and those in need of special assistance. 

The use of expert teachers is probably the only practical way to give 
specialized help to teachers who need it. l^xpert teachers should be 
selected on the basis of their competence as teachers and their ab*''W 
to provide supervision and assistance to aduils. These experts sh. aid 
work only in their own teaching area to ensure informed and relevant 
help. 



94 



7M 



Kxpn^t trarlu^FM inav lui Kivcn rrlcJiscd time (niul/or nddilional con- 
tract thnr). Such Mim! imua W ullocatecl for Hiipcrvision and aHHiHlaoro 
and proto'^^Ml from othor admiiUHtrativo dutioH. Of coiirHO, vv\v.i\im\ 
vill iiH'irasc costs and cause schcdulinK problems, particularly at. 
{\,i> olcnuMitary school level. Tlie added costs (prunarily associated 
Willi additioiud or Huhstitute teaehera) i)rovide time for su|)ervision and 
aHsistaiu-e. The .schedulinK problem, while not trivial, is sohihle; dis- 
tricts nuiHt experiment with new scliechdinf^' patterns. 

1 1. The school district should involve teacher orKani/alions in the 
design and ov(?rsi|dit ol rhor evaluation to ensure its le[^iti- 
niacy, fairncfr, and eff( tiess. 

The evaluation roles of management and teachers' organizations in 
districts whore teachers participate in decisionmaking differ from their 
roles in districts that use traditional hierarchical evaluatipn practices. 
The traditional management role of enforcing accountability is typi- 
cally seen as counterposing the traditional union role of affording pro- 
tections. Teacher participation in evaluation obscures the distinctions 
between management prerogatives and teachers' rights. When teachers 
dennc and enforce professional standards of phictice, they significantly 
resliape the traditional role^^ of both management and labor. 

The shift from an adversarial to a participatory approach increases 
teachers' rights but also their responsibilities. It forces administrators 
to share power but gives them more freedom and legitimized authority 
to implement decisions once they are jointly made. This change 
accords teachers power over a greater range of educational matters at 
the cost of absolute protections based on work rules. Some may see 
this evolution toward profer,sionalism as a threat to the basis of collec* 
tive bargaining. Others may view it as a more mature stage of educa- 
tional labor relations. 

Districts that have reached a higher sta^^e in labor relations can 
begin to redefine traditional management and labor roles (Mitchell and 
Kerchner, 1983, p. 220). Tb.i.s stage arrives only after the teachers* 
or?ranization has amas.sed sufficient power to be accepted as a partner 
in policymaking. When this occurs, teacher professionalism in the 
modern cont< :t may not threaten unionism. In such districts as 
Toledo, where organized teachers participate in the definition of teach- 
ing and in decisions about membership in the profession, our study 
found the evolution of yet a higher stage in labor relations that goes 
beyond negotiated policy to ncrMiated responsibility as the basis for 
school district operations. 

Negotiated responsibility pro-'des the basis for a collective profes- 
sionalism more potent than the mdividual professionalism that existed 
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when uiiorKani/,- 1 wncl i \un\ only pcMiD isivc} mitUority over (ho mh- 

ninnco of (Ihm. k. iL opoiui Ww way to collaborativcj control ovcir 

toaoher (jualiiv . I (triite^^ a franiovvork within which cducatorH— 
tcachcrH nnd an i" luJorH can work t(>;vlh(»r to improve the (|uahty 
of th(jir conimon prolc^tiional work. 

'l'oacli(4-M analy/.c th(i iiocmIh of their Htjidonta, a^HOHs avaihihlc 
roHourcoH, take co|,Mii/.ance of tho school HyutenrH \^oi\\i\, and decide; 
their inntructional stratet,aoH. Ah they instruct, they modify tlicir 
HtrategicH to onaure that their instruction ineetH the needs of their stu- 
dentH. They \ise a variety of means to assess whether the students 
hav(} hi/irn(?d. 

School districts evah»at,e teach(M'ji to •Mniur H»nt teachers employ 
npproi)riate standards of i)ractice. 'riu* nuclu ns and recommenda- 
tions offered hero are inteod^d to lea(' » conuiiiona that will help to 
imi)rovi the quality of tencners and te/ i'. 

In the bureaucratic, or traditions . ■ lun of teacher evaluation, 

the principal or another hierarchic >r of the teacher directly 

inspects the work of the teacher i.v ves the teacher engaged in 

the act of teaching. The priM( ju . ically asi esses the observed 
behavior against a list of criteria tur;^;'.)...; by the central administra- 
tion. These criteria assume th/n i'lartiiut-, ih planned, stable, and 
predictable. The principal then ^ *uc teacher. 

The professional concei)tioii invui u^aster teachers in the evalua- 
tion of other teachers. The n teacher helps to enforce a profes- 
sional standard of teaching. J . Aiis approach, the evaluator judges the 
appropriateness of teaching decisions. It assumes that teachers know 
subject matter and child development sufficiently well to make 
appropriate decisions for different studenta and classes. 

Rather than attempting to force a consensus on a single pioper stan- 
dard of practice, the professional approach operates on a consensus of 
what is improper or inappropriate practice. In the absence of agree- 
ment on the one best system of instruction, master teachers sanction 
different standards of practice. Different circumstances and different 
teachers' personalities may lend themselves to different methods of 
instruction. But under no circumstances does the approach tolerate 
inappropriate educational practice. 

Quality control through the enforcement of a professional standard 
of practice differs from quality control through prescribed curriculum 
and standardized testing. Both approaches contain risk. Bureaucratic 
policymaking make:^ teaching less attractive, thus ioweriiig the quality 
of the teaching force which, in ^rn, cause, districts to become more 
prescriptive in a vain effort to iiu| rove education. 

The professional approach relies on people and judgments. It places 
more weight on the development of client-responsive practices than on 
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the definition of standardi/.ed practice. It weeds out those unable or 
unwilling to develop competence, rather than controlling their damage 
by prescriptions for performance. It assumes that others will become 
more capable by engaging in the joint construction of goals, definition 
of standards of good practice, mutual criticism, and commitment to 
ongoing inquiry. It supposes that investing in staff development, 
career incentives, and evaluation, i,e,, in teachers themselves, will 
improve the quality of teaching. 

The bureaucratic approach has heavy costs; the time has come to try 
the professional approach to evaluation. We recommend, therefore: 

12, The school district should hold teachers accountable to stan- 
dards of practice that compel them to make appropriate 
instructional decisions on behalf of their students. 
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